TLDR: I'd like to propose adding a dependency on a modern unit testing
framework to make it easier to write unit tests within GCC. Before I spend much
more time on it, what sort of buy-in should I get? Are there any people in
particular I should work more closely with as I make this change?
Terminology: Within GCC, there are two types of tests in place: unit tests and
regression tests. The unit tests have been written with a home-grown selftest
framework and run as part of the build process. Any failures to a unit test
results in no compiler being produced. The regression tests, on the other hand,
run after build, and use the separate DejaGnu framework. In this email, I am
only concerning myself with the unit tests, and throughout the remainder of the
email, any mention of tests refers to these.
Working on GCC, I wanted to add some new unit tests to my feature as I went,
but I noticed that there is a good deal of friction involved. Right now, adding
new unit tests requires writing the test method, then modifying a second place
in the code to call said test method, repeating as necessary until getting all
the way to either the selftest.c file or the target hook. There is also no way
to do test setup/teardown automatically. Everything is manual.
I'd like to propose adding a dependency on a modern open-source unit testing
framework as an enhancement to the current self test system. I have used Catch2
(https://github.com/catchorg/Catch2, Boost Software License 1.0) with great
success in the past. I experimented with adding it to GCC and converting a
handful of tests to use Catch2. Although I only converted a small number of
tests, I didn't see any performance impact during selftest. As a bonus, while
doing so, I actually found that one test that I had written previously wasn't
actually being run, because I had failed to manually call it.
Some nice things that Catch2 provides are better error reporting (see below for
a comparison), ease of adding new tests (just include the header and write a
TEST_CASE(), as opposed to the manual plumbing required right now), extension
points for adding custom comparisons (I could see this being very useful to
expand on the current rtl test macros), and the ability to run a subset of the
tests without recompiling. It is also easy to integrate Catch2 with the
existing self-test framework.
If this path seems useful to others, I'm happy to pursue it further. A list of
work items I see are:
1. Convert more tests to verify the claim that build performance is not degraded
2. Update the docs to list Catch2 as the new recommended way to write unit tests
3. If all of the target self-tests are converted, then we can remove the target
test hook. Similar for the lang test hook.
One thing that would make Catch2 an even more slam-dunk case was if we were
able to enable exceptions for the check builds. Then, running the unit tests
could report multiple failures at the same time instead of just aborting at the
first one. That said, even without enabling exceptions, Catch2 is on par with
the current selftest in terms of terminating at the first failure.
Another option is to use a test framework that doesn't use exceptions, such as
Google Test (https://github.com/google/googletest, BSD 3-Clause "New" or
"Revised" License). I personally think Catch2 is more flexible, or I would lead
with Google Test. For example, in Catch2, shared setup is done in place with
the tests itself, having each subtest be a nested SECTION, where-as in GTest,
you have to write a test class that derives from ::test and overrides SetUp().
In addition, the sections in Catch2 can be nested further, allowing several
related tests to build on each other.
Here is some sample output for the case where all the tests are passing:
===============================================================================
All tests passed (25 assertions in 5 test cases)
And here is the output when a test fails:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
is a Catch v2.9.2 host application.
Run with -? for options
-------------------------------------------------------------------------------
test_set_range
-------------------------------------------------------------------------------
../../gcc/bitmap.c:2661
...............................................................................
../../gcc/bitmap.c:2668: FAILED:
REQUIRE( 6 == bitmap_count_bits (b) )
with expansion:
6 == 5
Catch will terminate because it needed to throw an exception.
The message was: Test failure requires aborting test!
terminate called without an active exception
../../gcc/bitmap.c:2668: FAILED:
{Unknown expression after the reported line}
due to a fatal error condition:
SIGABRT - Abort (abnormal termination) signal
===============================================================================
test cases: 2 | 1 passed | 1 failed
assertions: 5 | 3 passed | 2 failed
cc1: internal compiler error: Aborted
<long callstack>
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
(Note that at the moment it doesn't know the name of our application or it
would have prefixed "is a Catch..." with our app name).
Compare that to the output of the current test framework:
../../gcc/bitmap.c:2669: test_set_range: FAIL: ASSERT_EQ ((6),
(bitmap_count_bits (b)))
cc1: internal compiler error: in fail, at selftest.c:47
/bin/bash ../../gcc/../move-if-change tmp-macro_list macro_list
echo timestamp > s-macro_list
<long callstack>
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Thanks,
Andrew