On Thu, 2016-08-04 at 13:55 -0600, Jeff Law wrote:
> On 08/03/2016 09:45 AM, David Malcolm wrote:
> > This adds fix-it hints to c-format.c so that it can (sometimes)
> > suggest
> > the format string the user should have used.
> > 
> > The patch adds selftests for the new code in c-format.c.  These
> > selftests are thus lang-specific.  This is the first time we've had
> > lang-specific selftests, and hence the patch also adds a langhook
> > for
> > running them.  (Note that currently the Makefile only invokes the
> > selftests for cc1).
> > 
> > Successfully bootstrapped&regrtested in conjunction with the rest
> > of the
> > patch kit on x86_64-pc-linux-gnu.
> > 
> > (The v2 version of the patch had a successful selftest run for
> > stage 1 on
> > powerpc-ibm-aix7.1.3.0 (gcc111) in conjunction with the rest of the
> > patch
> > kit, and a successful build of stage1 for all targets via config
> > -list.mk;
> > the patch has only been rebased since)
> > 
> > OK for trunk if it passes testing?
> > 
> > gcc/c-family/ChangeLog:
> >     PR c/64955
> >     * c-common.h (selftest::c_format_c_tests): New declaration.
> >     (selftest::run_c_tests): New declaration.
> >     * c-format.c: Include "selftest.h.
> >     (format_warning_va): Add param "corrected_substring" and use
> >     it to add a replacement fix-it hint.
> >     (format_warning_at_substring): Likewise.
> >     (format_warning_at_char): Update for new param of
> >     format_warning_va.
> >     (check_format_info_main): Pass "fki" to check_format_types.
> >     (check_format_types): Add param "fki" and pass it to
> >     format_type_warning.
> >     (deref_n_times): New function.
> >     (get_modifier_for_format_len): New function.
> >     (selftest::test_get_modifier_for_format_len): New function.
> >     (get_format_for_type): New function.
> >     (format_type_warning): Add param "fki" and use it to attempt
> >     to provide hints for argument types when calling
> >     format_warning_at_substring.
> >     (selftest::get_info): New function.
> >     (selftest::assert_format_for_type_streq): New function.
> >     (ASSERT_FORMAT_FOR_TYPE_STREQ): New macro.
> >     (selftest::test_get_format_for_type_printf): New function.
> >     (selftest::test_get_format_for_type_scanf): New function.
> >     (selftest::c_format_c_tests): New function.
> > 
> > gcc/c/ChangeLog:
> >     PR c/64955
> >     * c-lang.c (LANG_HOOKS_RUN_LANG_SELFTESTS): If CHECKING_P, wire
> >     this up to selftest::run_c_tests.
> >     (selftest::run_c_tests): New function.
> > 
> > gcc/ChangeLog:
> >     PR c/64955
> >     * langhooks-def.h (LANG_HOOKS_RUN_LANG_SELFTESTS): New default
> >     do-nothing langhook.
> >     (LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_RUN_LANG_SELFTESTS.
> >     * langhooks.h (struct lang_hooks): Add run_lang_selftests.
> >     * selftest-run-tests.c: Include "tree.h" and "langhooks.h".
> >     (selftest::run_tests): Call lang_hooks.run_lang_selftests.
> > 
> > gcc/testsuite/ChangeLog:
> >     PR c/64955
> >     * gcc.dg/format/diagnostic-ranges.c: Add fix-it hints to
> > expected
> >     output.
> So presumably we always use the type of the argument as the "correct"
> type and assume the format string is what needs to be fixed (with the
> exception of getting the right amount of *s handled).  That seems 
> intuitively the right thing to do, but do we have a hit rate better
> than 
> 50% in practice?
> 
> This is OK.  I'm really just curious about your thoughts/experiences
> on 
> the heuristics.

Our current behavior is to emit a warning of the form:

test.c:9:18: warning: format ‘%i’ expects argument of type ‘int’, but
argument 2 has type ‘const char *’ [-Wformat=]
   printf("hello %i", msg);
                  ^

Note that the text of the diagnostic tells the user the two types
involved within the message.

My experience with printf-style issues of this form is that I know want
expression I want to print, but I don't necessarily know exactly
whether it's say, an int vs a long: it's not that I want to print "an
int", it's that I wanted to print some specific expression.  Hence 
 (assuming this experience is typical) for mismatches of printf-style
calls, I think it's most helpful to the user to tell the user what
format code they need for the expression they want to print, which the
patch does, via the fix-it:

test.c:9:18: warning: format ‘%i’ expects argument of type ‘int’, but
argument 2 has type ‘const char *’ [-Wformat=]
   printf("hello %i",
msg);
                 ~^
                 %s

Note how the text of the diagnostic is unchanged; it's just the fixit that's 
new (and the underline, which is patch 3 of the kit).

For scanf-style calls this argument may not be so strong, but I think it still 
holds for the "do I have an int or a long?" cases (giving int * vs long *).   
It would be nice for scanf to detect the "you forgot to put an & in front of 
the destination lvalue" case and offer a fix-it hint for it, but I think that's 
a followup.

So I think it's likely to be much better than 50%, but that's a gut feeling, 
based on the above arguments.

I'm not aware of anyone who's done formal usability testing of a compiler's 
diagnostics.  I think there are two distinct types of activity:
(a) the edit->try to compile->edit->try to compile -> "it compiles!" cycle 
(nested within a debug cycle)
(b) compiling pre-existing code, perhaps with a different configuration than it 
was written on, often written by someone else

I'd be very interested in seeing usability studies of both aspects of a 
compiler, but in particular of (a): is there a published corpus somewhere of 
the kind of half-written non-compiling code that happens during (a) out there?  
(I know this invites various sarcastic responses, but I'm serious :) )

(For myself, I attempt to run with a relatively recent build of gcc trunk as my 
day-to-day compiler, and if I see a diagnostic that could be improved whilst 
I'm hacking on something I make a note of it, and try to come back to it later 
as an RFE; bug reports of this form are most welcome).

Dave

Reply via email to