Re: [PATCH v9 1/2] Add condition coverage (MC/DC)

Jørgen Kvalsvik Thu, 22 Feb 2024 06:24:52 -0800

On 22/02/2024 14:26, Jan Hubicka wrote:

Hello,

This patch adds support in gcc+gcov for modified condition/decision
coverage (MC/DC) with the -fcondition-coverage flag. MC/DC is a type of
test/code coverage and it is particularly important for safety-critical
applicaitons in industries like aviation and automotive. Notably, MC/DC
is required or recommended by:


     * DO-178C for the most critical software (Level A) in avionics.
     * IEC 61508 for SIL 4.
     * ISO 26262-6 for ASIL D.

 From the SQLite webpage:

     Two methods of measuring test coverage were described above:
     "statement" and "branch" coverage. There are many other test
     coverage metrics besides these two. Another popular metric is
     "Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines
     MC/DC as follows:

         * Each decision tries every possible outcome.
         * Each condition in a decision takes on every possible outcome.
         * Each entry and exit point is invoked.
         * Each condition in a decision is shown to independently affect
           the outcome of the decision.

     In the C programming language where && and || are "short-circuit"
     operators, MC/DC and branch coverage are very nearly the same thing.
     The primary difference is in boolean vector tests. One can test for
     any of several bits in bit-vector and still obtain 100% branch test
     coverage even though the second element of MC/DC - the requirement
     that each condition in a decision take on every possible outcome -
     might not be satisfied.

     https://sqlite.org/testing.html#mcdc

MC/DC comes in different flavours, the most important being unique
cause MC/DC and masking MC/DC - this patch implements masking MC/DC,
which is works well with short circuiting semantics, and according to
John Chilenski's "An Investigation of Three Forms of the Modified
Condition Decision Coverage (MCDC) Criterion" (2001) is as good as
unique cause at catching bugs.

Whalen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for
MC/DC" describes an algorithm for determining masking from an AST walk,
but my algorithm figures this out from analyzing the control flow graph.
The CFG is considered a binary decision diagram and an evaluation
becomes a path through the BDD, which is recorded. Certain paths will
mask ("null out") the contribution from earlier path segments, which can
be determined by finding short circuit endpoints. Masking is the
short circuiting of terms in the reverse-ordered Boolean function, and
the masked terms do not affect the decision like short-circuited
terms do not affect the decision.

A tag/discriminator mapping from gcond->uid is created during
gimplification and made available through the function struct. The
values are unimportant as long as basic conditions constructed from a
single Boolean expression are given the same identifier. This happens in
the breaking down of ANDIF/ORIF trees, so the coverage generally works
well for frontends that create such trees.

Like Whalen et al this implementation records coverage in fixed-size
bitsets which gcov knows how to interpret. This takes only a few bitwise
operations per condition and is very fast, but comes with a limit on the
number of terms in a single boolean expression; the number of bits in a
gcov_unsigned_type (which is usually typedef'd to uint64_t). For most
practical purposes this is acceptable, and by default a warning will be
issued if gcc cannot instrument the expression.  This is a practical
limitation in the implementation, not the algorithm, so support for more
conditions can be added by also introducing arbitrary-sized bitsets.

In action it looks pretty similar to the branch coverage. The -g short
opt carries no significance, but was chosen because it was an available
option with the upper-case free too.

gcov --conditions:

         3:   17:void fn (int a, int b, int c, int d) {
         3:   18:    if ((a && (b || c)) && d)
conditions covered 3/8
condition  0 not covered (true false)
condition  1 not covered (true)
condition  2 not covered (true)
condition  3 not covered (true)
         1:   19:        x = 1;
         -:   20:    else
         2:   21:        x = 2;
         3:   22:}

gcov --conditions --json-format:

"conditions": [
     {
         "not_covered_false": [
             0
         ],
         "count": 8,
         "covered": 3,
         "not_covered_true": [
             0,
             1,
             2,
             3
         ]
     }
],

Expressions with constants may be heavily rewritten before it reaches
the gimplification, so constructs like int x = a ? 0 : 1 becomes
_x = (_a == 0). From source you would expect coverage, but it gets
neither branch nor condition coverage. The same applies to expressions
like int x = 1 || a which are simply replaced by a constant.

The test suite contains a lot of small programs and functions. Some of
these were designed by hand to test for specific behaviours and graph
shapes, and some are previously-failed test cases in other programs
adapted into the test suite.

gcc/ChangeLog:

        * builtins.cc (expand_builtin_fork_or_exec): Check
          condition_coverage_flag.
        * collect2.cc (main): Add -fno-condition-coverage to OBSTACK.
        * common.opt: Add new options -fcondition-coverage and
          -Wcoverage-too-many-conditions.
        * doc/gcov.texi: Add --conditions documentation.
        * doc/invoke.texi: Add -fcondition-coverage documentation.
        * function.cc (allocate_struct_function): Init cond_uids.
        * function.h (struct function): Add cond_uids.
        * gcc.cc: Link gcov on -fcondition-coverage.
        * gcov-counter.def (GCOV_COUNTER_CONDS): New.
        * gcov-dump.cc (tag_conditions): New.
        * gcov-io.h (GCOV_TAG_CONDS): New.
        (GCOV_TAG_CONDS_LENGTH): New.
        (GCOV_TAG_CONDS_NUM): New.
        * gcov.cc (class condition_info): New.
        (condition_info::condition_info): New.
        (condition_info::popcount): New.
        (struct coverage_info): New.
        (add_condition_counts): New.
        (output_conditions): New.
        (print_usage): Add -g, --conditions.
        (process_args): Likewise.
        (output_intermediate_json_line): Output conditions.
        (read_graph_file): Read condition counters.
        (read_count_file): Likewise.
        (file_summary): Print conditions.
        (accumulate_line_info): Accumulate conditions.
        (output_line_details): Print conditions.
        * gimplify.cc (next_cond_uid): New.
        (reset_cond_uid): New.
        (shortcut_cond_r): Set condition discriminator.
        (tag_shortcut_cond): New.
        (shortcut_cond_expr): Set condition discriminator.
        (gimplify_cond_expr): Likewise.
        (gimplify_function_tree): Call reset_cond_uid.
        * ipa-inline.cc (can_early_inline_edge_p): Check
          condition_coverage_flag.
        * ipa-split.cc (pass_split_functions::gate): Likewise.
        * passes.cc (finish_optimization_passes): Likewise.
        * profile.cc (struct condcov): New declaration.
        (cov_length): Likewise.
        (cov_blocks): Likewise.
        (cov_masks): Likewise.
        (cov_maps): Likewise.
        (cov_free): Likewise.
        (instrument_decisions): New.
        (read_thunk_profile): Control output to file.
        (branch_prob): Call find_conditions, instrument_decisions.
        (init_branch_prob): Add total_num_conds.
        (end_branch_prob): Likewise.
        * tree-core.h (struct tree_exp): Add condition_uid.
        * tree-profile.cc (struct conds_ctx): New.
        (CONDITIONS_MAX_TERMS): New.
        (EDGE_CONDITION): New.
        (topological_cmp): New.
        (index_of): New.
        (single_p): New.
        (single_edge): New.
        (contract_edge_up): New.
        (struct outcomes): New.
        (conditional_succs): New.
        (condition_index): New.
        (condition_uid): New.
        (masking_vectors): New.
        (emit_assign): New.
        (emit_bitwise_op): New.
        (make_top_index_visit): New.
        (make_top_index): New.
        (paths_between): New.
        (struct condcov): New.
        (cov_length): New.
        (cov_blocks): New.
        (cov_masks): New.
        (cov_maps): New.
        (cov_free): New.
        (find_conditions): New.
        (struct counters): New.
        (find_counters): New.
        (resolve_counter): New.
        (resolve_counters): New.
        (instrument_decisions): New.
        (tree_profiling): Check condition_coverage_flag.
        (pass_ipa_tree_profile::gate): Likewise.
        * tree.h (SET_EXPR_UID): New.
        (EXPR_COND_UID): New.

libgcc/ChangeLog:

        * libgcov-merge.c (__gcov_merge_ior): New.

gcc/testsuite/ChangeLog:

        * lib/gcov.exp: Add condition coverage test function.
        * g++.dg/gcov/gcov-18.C: New test.
        * gcc.misc-tests/gcov-19.c: New test.
        * gcc.misc-tests/gcov-20.c: New test.
        * gcc.misc-tests/gcov-21.c: New test.
        * gcc.misc-tests/gcov-22.c: New test.
        * gcc.misc-tests/gcov-23.c: New test.

diff --git a/gcc/function.cc b/gcc/function.cc
index 89841787ff8..e57d0488c7a 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -4793,6 +4793,7 @@ allocate_struct_function (tree fndecl, bool abstract_p)
cfun = ggc_cleared_alloc<function> ();+ cfun->cond_uids = hash_map <gcond*, unsigned>::create_ggc ();


There are a lot of struct functions created during WPA ICF stage, where
this hash will be completely unused.  I think you can initialize it to
NULL here, allocate it at the gimplification time only and also
deallocate in free_after_compilation.

It would be good to arrange the patch to do nothing unless the condition
coverage command line option is used.  At the moment it seems you
compute conditional uids just to ignore them later.

Also at the moment the hash will dangle pointers to statements that was
optimized out. There are two options. Either initialize it with
GTY((cache)) which will make garbage collector to remove entries to
statements not reachable otherwise.

I wonder, is that not already handled by using gcond_cache_map_traitstemplate parameter?

struct gcond_cache_map_traits


  : simple_cache_map_traits <ggc_cache_ptr_hash <gcond>, unsigned>
  {
  };

Of course, if the same thing can be accomplished by GTY((cached)) thatis much neater.

I have addressed the rest of the review, and will publish another draftsoon.


Other option would be move it out of gabrage collector completely and
extend gsi_remove to also remove entry in the hash just as it does so
This should also make it possible to move it out of ggc.
for EH (see calls to remove_stmt_from_eh_lp).

@@ -134,6 +136,33 @@ public:
    vector<unsigned> lines;
  };

+/* Describes a single conditional expression and the (recorded) conditions

+   shown to independently affect the outcome.  */
+class condition_info
+{
+public:
+  condition_info ();
+
+  int popcount () const;
+
+  /* Bitsets storing the independently significant outcomes for true and false,
+   * respectively.  */

Extra * here.

+  gcov_type_unsigned truev;
+  gcov_type_unsigned falsev;
+
+  /* Number of terms in the expression; if (x) -> 1, if (x && y) -> 2 etc.  */
+  unsigned n_terms;
+};

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 342e43a7f25..591cb50193c 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -71,6 +71,24 @@ along with GCC; see the file COPYING3.  If not see
  #include "context.h"
  #include "tree-nested.h"

By coding style there should be acomment what nextuid is used for.

+static unsigned nextuid = 1;
+/* Get a fresh identifier for a new condition expression.  This is used for
+   condition coverage.  */
+static unsigned
+next_cond_uid ()
+{
+    return nextuid++;
+}
+/* Reset the condition uid to the value it should have when compiling a new
+   function.  0 is already the default/untouched value, so start at non-zero.
+   A valid and set id should always be > 0.  This is used for condition
+   coverage.  */
+static void
+reset_cond_uid ()
+{
+    nextuid = 1;
+}
+
  /* Hash set of poisoned variables in a bind expr.  */

+/* Given a multi-term condition (ANDIF, ORIF), walk the predicate and tag every
+   term with uid.  When two basic conditions share the uid discriminator when
+   they belong to the same predicate, which used by the condition coverage.
+   Doing this as an explicit step makes for a simpler implementation than
+   weaving it into the splitting code as the splitting code eventually reaches
+   back up to gimplfiy_expr which makes bookkeeping complicated.  */


Coding style also asks to document parameters (and reffer to them in
uppercase, like PRED/CONDITION_UID)

+static void
+tag_shortcut_cond (tree pred, unsigned condition_uid)
+{
+  if (TREE_CODE (pred) == TRUTH_ANDIF_EXPR
+      || TREE_CODE (pred) == TRUTH_ORIF_EXPR)
+    {
+      tree fst = TREE_OPERAND (pred, 0);
+      tree lst = TREE_OPERAND (pred, 1);
+
+      if (TREE_CODE (fst) == TRUTH_ANDIF_EXPR
+         || TREE_CODE (fst) == TRUTH_ORIF_EXPR)
+       tag_shortcut_cond (fst, condition_uid);
+      else if (TREE_CODE (fst) == COND_EXPR)
+       SET_EXPR_UID (fst, condition_uid);
+
+      if (TREE_CODE (lst) == TRUTH_ANDIF_EXPR
+         || TREE_CODE (lst) == TRUTH_ORIF_EXPR)
+       tag_shortcut_cond (lst, condition_uid);
+      else if (TREE_CODE (lst) == COND_EXPR)
+       SET_EXPR_UID (lst, condition_uid);
+    }
+}
  /* Given a conditional expression EXPR with short-circuit boolean
     predicates using TRUTH_ANDIF_EXPR or TRUTH_ORIF_EXPR, break the
     predicate apart into the equivalent sequence of conditionals.  */

static tree

-shortcut_cond_expr (tree expr)
+shortcut_cond_expr (tree expr, unsigned condition_uid)

Similarly here CONDITION_UID should be documented.

+
+/* Compare two basic blocks by their order in the expression i.e. for (a || b)
+   then topological_cmp (a, b, ...) < 0.  The result is undefined if lhs, rhs
+   belong to different expressions.  The top_index argument should be the
+   top_index vector from ctx.  */

Please use LHS/RHS and TOP_INDEX in commants (in all functions you aded).
It also seems that you are not releasing the hash after its use.  It
leaks points to gimple statments that are removed, so it may

The patch look OK to me with the hash memory management change above.
I really do apologize for being so slow on the reviews - it is an
interesting feature and I should have handled it faster.

Honza

Re: [PATCH v9 1/2] Add condition coverage (MC/DC)

Reply via email to