Re: RFC: Monitoring old PRs, new dg directives

Marek Polacek via Gcc-patches Tue, 04 Aug 2020 14:35:08 -0700

Hi Mike,

thanks for your comments.


On Tue, Jul 28, 2020 at 06:37:26PM -0700, Mike Stump via Gcc-patches wrote:
> I'll punt to the the C++ front-end folks to chime in.  Usually we only check 
> in bugs that are fixed, as they are fixed, this is what makes it a regression 
> suite.  Doing this does have advantages, like, the testsuite is small and 
> doesn't have duplicates and doesn't test anything that is known to fail that 
> isn't an actual regression.

We also add sanity tests for new language features, for example.  I also like
adding (small) tests for DRs that happen to be already fixed to insure that the
compiler continues to behave as expected.  Avoiding duplicates is a good thing,
obviously, but, with this new scheme, if you fix something and while testing
the fix you find that we already have a test, you can choose not to introduce
a new test.  Whether or not a compiler bug is a regression shouldn't, IMHO, be
a criterion for (not) including the test.

tree(1) in testsuite/g++.dg says 70 directories, 13406 files.  If I go wild and
add 200 new C++ tests, that's ~1.5% increase in the number of tests.  That seems
reasonable.  If it still causes grief for some people, and we go with the idea
of using an unfixed/ directory, we could add GCC_TESTSUITE_TEST_UNFIXED envvar
to enable or disable running such tests.

> Ideally, it would be nice to have a way to test bugs out of bugzilla, and 
> report on those fixed bugs as they are fixed.  If people want to keep the 
> test suite a regression suite, then my counter proposal would be to have a 
> branch with the non-regression bugs on it and then people can checkout and 
> test that branch.  Most of the people don't (saving the testing time, which 
> is handy), and then more sporadically, the old bugs branch can be test and BZ 
> state can be moved along as bugs as fixed.  A run once a week or even once a 
> month would seem to be plenty often.

I'm afraid this would defeat the point of this proposal.  If the additional
tests are only available on a branch, no one is going to use it.  Quite
frankly, I'd just stick with my personal repo (where I can do whatever I want,
include test with #includes, unreduced tests, ...) rather than to bother with
rebasing a branch etc.  Even if someone would actually use such a branch from
time to time, we'd lose the benefit of "left shifting" bugs in the developer
workflow -- you would only notice that your patch had changed something days
or weeks after the commit at which point you likely have lost state on it.

> You in general don't need to check in fixes from bug reports that have been 
> fixed in the past, as those fixes generally already have a test case for the 
> fix that went in with the fix.

I agree, but experience shows that's not always the case.  Just the few PRs
I mentioned in my original mail prove that.  There are patches that change
something as a side-effect only, and, as a developer, you want to be aware
of those side-effects.

> As for how old is old, we'd leave that to the contributor of work to decide.  
> In theory, bugs can be added as soon as they come in, no need to wait.  To 
> the extent that waiting saves work, well, that's a personal choice, for the 
> person doing the work to choose.

I agree.  I'd leave it up to developers.  Just... be reasonable.  No need to
add every broken nonsense test lurking in Bugzilla.

> Why not just use xfail and xpass?  Seems less work than doing a setup_xfail.  
> Also, why not just use the existing directives instead of adding new 
> directives?  I'm suspicious of expect_ice and accepts_invalid.  You set them 
> to 1 all the time, but almost never set them to 0?  I'm wondering if it 
> should be more like shouldfail?

Good point, I missed that I could use xfail and xpass.  Fixed.

For accepts_invalid I think we *could* use the existing directives, if you know
where to expect an error.  But for ICEs we currently have nothing that would
work well.  I'll probably drop dg-accepts-invalid completely.

I followed shouldfail's suit when it comes to setting them to 1, so that should
be fine.

Thanks,
Marek

> On Jul 28, 2020, at 2:44 PM, Marek Polacek via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> > 
> > In Bugzilla, for the c++ component, we currently have over 3200 open bugs.  
> > In
> > my experience, a good amount of them have already been fixed; my periodical
> > sweeps always turn up a bunch of PRs that had already been fixed previously.
> > Sometimes my sweeps are more or less random, but more often than not I'm 
> > just
> > looking for duplicates of an existing PR.  Sometimes the reason the already
> > fixed PRs are still open is because a PR that was fixed had duplicates that 
> > we
> > didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as a
> > side-effect of fixing another PR.  Manual sweeps are tedious and 
> > time-consuming
> > because often you need to grab the test from the Bugzilla yet again (and
> > sometimes there are multiple tests).  Even if you find a PR that was fixed, 
> > you
> > still need to bisect the fix and perhaps add the test to our testsuite.  
> > That's
> > draining and since the number of bugs only increases, never decreases, it 
> > is not
> > sustainable.
> > 
> > So I've started a personal repo where I've gathered dozens of tests and 
> > wrote a
> > script that just compiles every test in the repo and reports if anything
> > changed.  One line from it:
> > 
> > pr=59798; $cxx $o -c $pr.C 2>&1 | grep -qE 'internal compiler error' || 
> > echo -e "$pr: ${msg_ice}"
> > 
> > This has major drawbacks: you have to remember to run this manually, keep
> > updating it, and it's yet another repo that people interested in this would
> > have to clone, but the worst thing is that typically you would only discover
> > that a patch fixed a different PR long after the patch was committed.  And
> > quite likely it wasn't even your patch.  We know that finding problems 
> > earlier
> > in the developer workflow reduces costs; if we can catch this before the
> > original developer commits & pushes the changes, it's cheaper, because the
> > developer already understands what the patch does.
> > 
> > A case in point: https://gcc.gnu.org/PR58156 which has been fixed recently
> > by an unrelated (?) patch.  Knowing that the tsubst_pack_expansion hunk in
> > the patch had this effect would probably have been very useful.  More 
> > testing
> > will lead to a better compiler.
> > 
> > Another case: https://gcc.gnu.org/35098 which was fixed 12 years (!) after
> > it was reported by a different change.
> > 
> > Or another: https://gcc.gnu.org/91525 where the patch contained a test, but
> > that was ice-on-invalid, whereas the test in PR91525 was ice-on-valid.
> > 
> > To alleviate some of these problems, I propose that we introduce a means to 
> > our
> > DejaGNU infrastructure that allows adding tests for old bugs that have not 
> > been
> > fixed yet, and re-introduce the keyword monitored (no longer used for 
> > anything
> > -- I think Volker stepped away) to the GCC Bugzilla to signal that a PR is
> > tracked in the testsuite.  I don't want any unnecessary moving tests 
> > around, so
> > the tests would go where they would normally go; they have to be reduced and
> > have proper targets, etc.  Having such tests in the testsuite means that 
> > when
> > something changes, you will know immediately, before you push any changes.
> > 
> > My thinking is that for:
> > 
> > * rejects-valid: use the existing dg-xfail-if
> > * accepts-valid: use the new dg-accepts-invalid
> > * ICEs: use the new dg-ice
> > 
> > dg-ice can be used like this:
> > 
> > // { dg-ice "build_over_call" { target c++11 } }
> > 
> > and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
> > ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
> > you'll get an XPASS + FAIL.  Then you can close the old PR.
> > 
> > Similarly, dg-accepts-invalid:
> > 
> > // { dg-accepts-invalid "PR86500" }
> > 
> > means that if the test still compiles without errors, you'll get a quiet 
> > XFAIL.
> > If we start giving errors, you'll get an XPASS.
> > 
> > If the bug is fixed, simply remove the directive.
> > 
> > The patch implementing these new directives is appended.  Once/if this is
> > accepted, I can start adding the old tests we have in our Bugzilla.  (I'm
> > only concerned about the c++ component, if that wasn't already clear.)
> > 
> > The question is what makes the bug "old": is it one year without it being
> > assigned?  6 months?  3 months?  Note: I *don't* propose to add every test 
> > for
> > every new PR, just the reasonably old ones that are useful/important.  Such
> > additions should be done in batches, so that we don't have dozens of 
> > commits,
> > each of them merely adding a single test.
> > 
> > We will still have a surfeit of bugs that we've given short shrift to, but
> > let's at least automate what we can.  The initial addition of the relevant
> > old(-ish) tests won't of course happen automagically, but it's a price I'm
> > willing to pay.  My goal here isn't merely to reduce the number of open PRs;
> > it is to improve the testing of the compiler overall.
> > 
> > Thoughts?
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu.
> > 
> > [PATCH] testsuite: Introduce dg-ice and dg-accepts-invalid.
> > 
> > gcc/ChangeLog:
> > 
> >     * doc/sourcebuild.texi: Document dg-ice and dg-accepts-invalid.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >     * lib/gcc-dg.exp (gcc-dg-test-1): Handle dg-ice and dg-accepts-invalid.
> >     * lib/prune.exp (prune_ices): New.
> >     * lib/target-supports-dg.exp (dg-accepts-invalid, dg-ice): New.
> > ---
> > gcc/doc/sourcebuild.texi                 | 19 +++++++
> > gcc/testsuite/lib/gcc-dg.exp             | 39 +++++++++++++-
> > gcc/testsuite/lib/prune.exp              |  9 ++++
> > gcc/testsuite/lib/target-supports-dg.exp | 69 ++++++++++++++++++++++++
> > 4 files changed, 134 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index a7a922d84a2..636d21d30dd 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -1172,6 +1172,16 @@ Expect the execute step of a test to fail if the 
> > conditions (which are
> > the same as for @code{dg-skip-if}) are met.
> > @end table
> > 
> > +@subsubsection Expect the compiler to crash
> > +
> > +@table @code
> > +@item  @{ dg-ice @var{comment} [@{ @var{selector} @} [@{ 
> > @var{include-opts} @} [@{ @var{exclude-opts} @}]]] @}
> > +Expect the compiler to crash with an internal compiler error and return
> > +a nonzero exit status if the conditions (which are the same as for
> > +@code{dg-skip-if}) are met.  Used for tests that test bugs that have not 
> > been
> > +fixed yet.
> > +@end table
> > +
> > @subsubsection Expect the test executable to fail
> > 
> > @table @code
> > @@ -1234,6 +1244,15 @@ has the same effect as @samp{target}.
> > 
> > @item @{ dg-prune-output @var{regexp} @}
> > Prune messages matching @var{regexp} from the test output.
> > +
> > +@table @code
> > +@item  @{ dg-accepts-invalid @var{comment} [@{ @var{selector} @} [@{ 
> > @var{include-opts} @} [@{ @var{exclude-opts} @}]]] @}
> > +Expect the compiler to accept the test (even though it should be rejected 
> > with
> > +a compile-time error), if the conditions (which are the same as for
> > +@code{dg-skip-if}) are met.  Used for tests that test bugs that have not 
> > been
> > +fixed yet.
> > +@end table
> > +
> > @end table
> > 
> > @subsubsection Verify output of the test executable
> > diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> > index 45d97024883..6478eda283b 100644
> > --- a/gcc/testsuite/lib/gcc-dg.exp
> > +++ b/gcc/testsuite/lib/gcc-dg.exp
> > @@ -308,13 +308,48 @@ proc gcc-dg-test-1 { target_compile prog do_what 
> > extra_tool_flags } {
> >     verbose "$target_compile $prog $output_file $compile_type $options" 4
> >     set comp_output [$target_compile "$prog" "$output_file" "$compile_type" 
> > $options]
> > 
> > +    global expect_ice
> >     # Look for an internal compiler error, which sometimes masks the fact
> >     # that we didn't get an expected error message.  XFAIL an ICE via
> >     # dg-xfail-if and use { dg-prune-output ".*internal compiler error.*" }
> >     # to avoid a second failure for excess errors.
> > -    if [string match "*internal compiler error*" $comp_output] {
> > +    # "Error reporting routines re-entered" ICE says "Internal" rather than
> > +    # "internal", so match that too.
> > +    if [string match {*[Ii]nternal compiler error*} $comp_output] {
> >     upvar 2 name name
> > -   fail "$name (internal compiler error)"
> > +   if { $expect_ice == 0 } {
> > +     fail "$name (internal compiler error)"
> > +   } else {
> > +     # We expected an ICE and we got it.  Emit an XFAIL.
> > +     setup_xfail "*-*-*"
> > +     fail "$name (internal compiler error)"
> > +     clear_xfail "*-*-*"
> > +     # Prune the ICE from the output.
> > +     set comp_output [prune_ices $comp_output]
> > +   }
> > +    } else {
> > +   upvar 2 name name
> > +   global accepts_invalid
> > +   if { $expect_ice == 1 } {
> > +     # We expected an ICE but we didn't get it.  We want an XPASS, so
> > +     # call setup_xfail to set xfail_flag.
> > +     setup_xfail "*-*-*"
> > +     pass "$name (internal compiler error)"
> > +     clear_xfail "*-*-*"
> > +   } elseif { $accepts_invalid == 1 } {
> > +       if [string match {*error: *} $comp_output] {
> > +         # We expected that this test be (wrongly) accepted, but now we 
> > have
> > +         # seen error(s).  Issue an XPASS to signal that.
> > +         setup_xfail "*-*-*"
> > +         pass "$name (accepts invalid)"
> > +         clear_xfail "*-*-*"
> > +       } else {
> > +         # This test is still (wrongly) accepted.  Just emit an XFAIL.
> > +         setup_xfail "*-*-*"
> > +         fail "$name (accepts invalid)"
> > +         clear_xfail "*-*-*"
> > +       }
> > +   }
> >     }
> > 
> >     if { $do_what == "repo" } {
> > diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
> > index 1c776249f1a..58a739684a5 100644
> > --- a/gcc/testsuite/lib/prune.exp
> > +++ b/gcc/testsuite/lib/prune.exp
> > @@ -118,6 +118,15 @@ proc prune_file_path { text } {
> >     return $text
> > }
> > 
> > +# Prune internal compiler error messages, including the "Please submit..."
> > +# footnote.
> > +
> > +proc prune_ices { text } {
> > +  regsub -all "(^|\n)\[^\n\]*: internal compiler error:.*for 
> > instructions\[^\n\]*" $text "" text
> > +  regsub -all "(^|\n|')*Internal compiler error:.*for 
> > instructions\[^\n\]*" $text "" text
> > +  return $text
> > +}
> > +
> > # Provide a definition of this if missing (delete after next dejagnu 
> > release).
> > 
> > if { [info procs prune_warnings] == "" } then {
> > diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
> > b/gcc/testsuite/lib/target-supports-dg.exp
> > index 2a21424b890..765f3a2e27a 100644
> > --- a/gcc/testsuite/lib/target-supports-dg.exp
> > +++ b/gcc/testsuite/lib/target-supports-dg.exp
> > @@ -495,6 +495,75 @@ proc dg-shouldfail { args } {
> >     }
> > }
> > 
> > +# Record whether the compiler is expected (at the moment) to ICE.
> > +# Used for tests that test bugs that have not been fixed yet.
> > +
> > +set expect_ice 0
> > +set accepts_invalid 0
> > +
> > +proc dg-ice { args } {
> > +    # Don't bother if we're already skipping the test.
> > +    upvar dg-do-what dg-do-what
> > +    if { [lindex ${dg-do-what} 1] == "N" } {
> > +      return
> > +    }
> > +
> > +    global accepts_invalid
> > +    # Can't be combined with dg-accepts-invalid.
> > +    if { $accepts_invalid == 1 } {
> > +      error "dg-ice: cannot be combined with dg-accepts-invalid"
> > +      return
> > +    }
> > +
> > +    global expect_ice
> > +
> > +    set args [lreplace $args 0 0]
> > +    if { [llength $args] > 1 } {
> > +   set selector [list target [lindex $args 1]]
> > +   if { [dg-process-target-1 $selector] == "S" } {
> > +       # The target matches, now check the flags.
> > +       if [check-flags $args] {
> > +           set expect_ice 1
> > +       }
> > +   }
> > +    } else {
> > +   set expect_ice 1
> > +    }
> > +}
> > +
> > +# Record whether the compiler should reject the testcase with an error,
> > +# but currently doesn't do so.  Used for accepts-invalid bugs.
> > +
> > +proc dg-accepts-invalid { args } {
> > +    # Don't bother if we're already skipping the test.
> > +    upvar dg-do-what dg-do-what
> > +    if { [lindex ${dg-do-what} 1] == "N" } {
> > +      return
> > +    }
> > +
> > +    global expect_ice
> > +    # Can't be combined with dg-ice.
> > +    if { $expect_ice == 1 } {
> > +      error "dg-accepts-invalid: cannot be combined with dg-ice"
> > +      return
> > +    }
> > +
> > +    global accepts_invalid
> > +
> > +    set args [lreplace $args 0 0]
> > +    if { [llength $args] > 1 } {
> > +   set selector [list target [lindex $args 1]]
> > +   if { [dg-process-target-1 $selector] == "S" } {
> > +       # The target matches, now check the flags.
> > +       if [check-flags $args] {
> > +           set accepts_invalid 1
> > +       }
> > +   }
> > +    } else {
> > +   set accepts_invalid 1
> > +    }
> > +}
> > +
> > # Intercept the call to the DejaGnu version of dg-process-target to
> > # support use of an effective-target keyword in place of a list of
> > # target triplets to xfail or skip a test.
> > 
> > base-commit: f3665bd1111c1799c0421490b5e655f977570354
> > -- 
> > 2.26.2
> >

Re: RFC: Monitoring old PRs, new dg directives

Reply via email to