analyzer: Weekly update on extending C++ support (2)

Benjamin Priour via Gcc Mon, 07 Aug 2023 06:06:05 -0700

Hi Dave,

I made some more progress on moving tests from gcc.dg/analyzer/ to
c-c++-common/analyzer.
I'll only detail the most noteworthy tests I encountered.


gcc.dg/analyzer/pr103892.c troubled me with an Excess error when compiled
with g++
analysis bailed out early (91 'after-snode' enodes; 314 enodes)
[-Wanalyzer-too-complex]

pr103892.c is compiled with optimization level -O2.
Analysis bails out when the number of "after-SN" enodes explodes, i.e.
exceeds a certain proportion of the total number of SG nodes.
limit(after-SN) = m_sg.num_nodes ()  * param-bb-explosion-factor.
The reason why C and C++ differs is simply due to what '-O2' does to each
of them.
Under -O0 or -O1, there is no failure whatsoever, under -O2 only C++ fails,
whereas with -O3 both gcc and g++ emits -Wanalyzer-too-complex.
With -O2, although GCC produces a greater total number of "after-SN" enodes
than G++, their proportion barely stays under limit(after-SN) as
the total number of SN is also bigger, hence no warning is emitted.

All in all, I don't believe there is a significant difference here between
C and C++, nor is there much we can do about this.
Therefore I'll simply add a { dg-warning "analysis bailed out early" "" {
target c++ } }

An interesting divergence between GCC and G++ was in the handling of
built-ins.
Tests gcc.dg/analyzer/{pr104062.c,sprintf-2.c} are failing when compiling
with G++.
FAIL: c-c++-common/analyzer/pr104062.c  leak of ap7 at line 13 (test for
warnings, line 12)
The reason is neither realloc nor sprintf are considered by G++ as a
built-in, contrary to GCC.
C built-ins are mapped within the analyzer to known_functions using their
'enum combined_fn' code
(See kf.cc:register_known_functions), not their function name.
Therefore, we find a built-in known function by checking
tree.h:fndecl_built_in_p returns true
and calling known_function_manager::get_normal_builtin.
The former returns false for 'realloc' and 'sprintf' when using G++.

To fix that, I've derived builtin_known_function from known_function.

/* Subclass of known_function for builtin functions.  */

class builtin_known_function : public known_function
{
public:
  virtual enum built_in_function builtin_code () const = 0;
  tree builtin_decl () const {
    gcc_assert (builtin_code () < END_BUILTINS);
    return builtin_info[builtin_code ()].decl;
  }

  virtual const builtin_known_function *
  dyn_cast_builtin_kf () const { return this; }
  virtual builtin_known_function *
  dyn_cast_builtin_kf () { return this; }
};

And 'kfm.add (BUILT_IN_REALLOC, make_unique<kf_realloc> ());' becomes
kfm.add ("realloc", make_unique<kf_realloc> ());
kfm.add ("__builtin_realloc", make_unique<kf_realloc> ());

Unfortunately we have to register the built-ins using their function name
which is more apt to bugs,
and the double call to kfm.add is quite messy. Having two instances of
kf_realloc however is not that troubling,
as builtin_known_function objects are lightweight.

That GCC built-ins are not recognized by G++ doesn't only impede their
detection,
but also how we process them, and what information we have of them.
In gcc.dg/analyzer/sprintf-2.c, sprintf signature follows the manual and
cppreference.
Yet we expect sm-malloc to warn about use of NULL when either of the
argument is NULL,
although there is no attribute(nonnull) specified.
In fact, sm-malloc isn't using the signature of sprintf as specified in the
test case, but rather the one provided
by GCC in builtins.def

/* See e.g. https://en.cppreference.com/w/c/io/fprintf
   and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */

  extern int
  sprintf(char* dst, const char* fmt, ...)
    __attribute__((__nothrow__)); // No attribute(nonnull)

int
test_null_dst (void)
{
  return sprintf (NULL, "hello world"); /* { dg-warning "use of NULL where
non-null expected" } */
}

int
test_null_fmt (char *dst)
{
  return sprintf (dst, NULL);  /* { dg-warning "use of NULL where non-null
expected" } */
}

Signature in builtins.def: DEF_LIB_BUILTIN        (BUILT_IN_SPRINTF,
"sprintf", BT_FN_INT_STRING_CONST_STRING_VAR,
ATTR_NOTHROW_NONNULL_1_FORMAT_PRINTF_2_3)

My question is then : Should G++ behave as GCC and ignore the user's
signatures of built-ins, instead using the attributes specified by GCC ?
At the moment I went with a "yes". If the function is a builtin, I'm
operating on its builtin_known_function::builtin_decl () (see above) rather
than the callee_fndecl
deduced from a gimple call, therefore overriding the user's signature.

Doing so led to tests sprintf-2.c and realloc-1.c to pass both in GCC and
G++.


Last test I'd like to discuss is analyzer/pr99193-1.c

/* { dg-additional-options "-Wno-analyzer-too-complex" } */

/* Verify absence of false positive from -Wanalyzer-mismatching-deallocation
   on realloc(3).
   Based on
https://github.com/libguestfs/libguestfs/blob/f19fd566f6387ce7e4d82409528c9dde374d25e0/daemon/command.c#L115
   which is GPLv2 or later.  */

typedef __SIZE_TYPE__ size_t;
typedef __builtin_va_list va_list;

#ifdef __cplusplus
  #if __cplusplus >= 201103L
  #define NULL nullptr
  #else
  #define NULL 0
  #endif
#else
  #define NULL ((void *)0)
#endif

extern void *malloc (size_t __size)
  __attribute__ ((__nothrow__ , __leaf__))
  __attribute__ ((__malloc__))
  __attribute__ ((__alloc_size__ (1)));
extern void perror (const char *__s);
extern void *realloc (void *__ptr, size_t __size)
  __attribute__ ((__nothrow__ , __leaf__))
  __attribute__ ((__warn_unused_result__))
  __attribute__ ((__alloc_size__ (2)));

extern void guestfs_int_cleanup_free (void *ptr);
extern int commandrvf (char **stdoutput, char **stderror, unsigned flags,
                       char const* const *argv);
#define CLEANUP_FREE __attribute__((cleanup(guestfs_int_cleanup_free)))

int
commandrf (char **stdoutput, char **stderror, unsigned flags,
           const char *name, ...)
{
  va_list args;
  CLEANUP_FREE const char **argv = NULL;
  char *s;
  int i, r;

  /* Collect the command line arguments into an array. */
  i = 2;
  argv = (const char **) malloc (sizeof (char *) * i);

 if (argv == NULL) {
    perror ("malloc");
    return -1;
  }
  argv[0] = (char *) name;
  argv[1] = NULL;

  __builtin_va_start (args, name);

  while ((s = __builtin_va_arg (args, char *)) != NULL) {
    const char **p = (const char **) realloc (argv, sizeof (char *) *
(++i)); /* { dg-bogus "'free'" } */
    if (p == NULL) {
      perror ("realloc");
      __builtin_va_end (args); // (*)
      return -1;
    }
    argv = p;
    argv[i-2] = s;
    argv[i-1] = NULL;
  }

  __builtin_va_end (args);

  r = commandrvf (stdoutput, stderror, flags, argv);

  return r;
} // G++ emits Excess error here

Even without the prior changes to the processing of built-ins, va_end and
__builtin_va_end are recognized by G++.
Yet, G++ fails with an Excess error "warning: missing call to 'va_end'
[-Wanalyzer-va-list-leak]".

/home/vultkayn/Desktop/gcc_sources_perso/gcc/gcc/testsuite/c-c++-common/analyzer/pr99193-1.c:75:1:
warning: missing call to ‘va_end’ [-Wanalyzer-va-list-leak]
   75 | }
      | ^
  ‘int commandrf(char**, char**, unsigned int, const char*, ...)’: events
1-3
    |
    |   49 |  if (argv == NULL) {
    |      |  ^~
    |      |  |
    |      |  (1) following ‘false’ branch...
    |......
    |   53 |   argv[0] = (char *) name;
    |      |         ~
    |      |         |
    |      |         (2) ...to here
    |......
    |   56 |   __builtin_va_start (args, name);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                      |
    |      |                      (3) ‘va_start’ called here (state of ‘’:
‘start’ -> ‘started’, NULL origin, meaning: {verb: ‘acquire’, noun:
‘resource’})
    |
  ‘int commandrf(char**, char**, unsigned int, const char*, ...)’: events
4-9
    |
    |   58 |   while ((s = __builtin_va_arg (args, char *)) != NULL) {
    |      |                                                ^
    |      |                                                |
    |      |                                                (4) following
‘true’ branch...
    |   59 |     const char **p = (const char **) realloc (argv, sizeof
(char *) * (++i)); /* { dg-bogus "'free'" } */
    |      |
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                                              |
            |
    |      |                                              |
            (5) ...to here
    |      |                                              (6) when
‘realloc’ fails
    |   60 |     if (p == NULL) {
    |      |     ~~
    |      |     |
    |      |     (7) following ‘true’ branch (when ‘p’ is NULL)...
    |   61 |       perror ("realloc");
    |      |       ~~~~~~~~~~~~~~~~~~
    |      |              |
    |      |              (8) ...to here
    |......
    |   75 | }
    |      | ~
    |      | |
    |      | (9) missing call to ‘va_end’ to match ‘va_start’ at (3) (in
global state ‘started’)
    |

The IPA doesn't differ much between C and C++ at this sec
// gcc -fdump-ipa-analyzer
  <bb 5> :
  i_44 = i_22 + 1;
  _8 = (long unsigned int) i_44;
  _9 = _8 * 8;
  argv.3_10 = argv;
  p_46 = realloc (argv.3_10, _9);
  if (p_46 == 0B)
    goto <bb 6>; [INV]
  else
    goto <bb 7>; [INV]

  <bb 6> :
  perror ("realloc");
  __builtin_va_end (&args);
  _52 = -1;
  // predicted unlikely by early return (on trees) predictor.
  goto <bb 10>; [INV]

// g++ -fdump-analyzer-ipa
  <bb 6> :
  i_48 = i_22 + 1;
  _8 = (long unsigned int) i_48;
  _9 = _8 * 8;
  argv.3_10 = argv;
  p_50 = realloc (argv.3_10, _9);
  if (p_50 == 0B) // if p == NULL
    goto <bb 7>; [INV]
  else
    goto <bb 9>; [INV]

  <bb 7> :
  perror ("realloc");

  <bb 8> :
  __builtin_va_end (&args);
  _56 = -1;
  // predicted unlikely by early return (on trees) predictor.
  goto <bb 13>; [INV]

If I comment out the call to perror("realloc"), then G++ behaves as GCC.
If I replace perror ("realloc") by any other call, to a fully-defined
function or whatever, then the warning reappears.
The above SSA differ by an extra basic block in G++ that splits GCC's <bb
6> in two.
However, that doesn't account for much, as changing
  if (p == NULL) {
      perror ("realloc");
      __builtin_va_end (args); // (*)
      return -1;
    }
to
    if (p == NULL) {
      int x;
      perror("realloc");
      if (x > 7)
        x = 12;
      __builtin_va_end (args);
      return -1;
    }
generates a similar basic block around __builtin_va_end, yet G++ keeps
emitting the false positive whereas GCC doesn't.
// GCC modified SSA
  <bb 6> :
  perror ("realloc");
  if (x_51(D) > 7)
    goto <bb 7>; [INV]
  else
    goto <bb 8>; [INV]

  <bb 7> :
  x_52 = 12;

  <bb 8> :
  __builtin_va_end (&args);
  _54 = -1;
  // predicted unlikely by early return (on trees) predictor.
  goto <bb 12>; [INV]

// G++ modified SSA
  <bb 7> :
  perror ("realloc");

  <bb 8> :
  if (x_55(D) > 7)
    goto <bb 9>; [INV]
  else
    goto <bb 10>; [INV]

  <bb 9> :
  x_56 = 12;

  <bb 10> :
  __builtin_va_end (&args);
  _58 = -1;
  // predicted unlikely by early return (on trees) predictor.
  goto <bb 15>; [INV]

Debugging and glaring at the exploded graph gave me no clue towards fixing
this test.
If you have any hint, I welcome it.

I hope the above isn't too indigest.

Cheers,
Benjamin.

analyzer: Weekly update on extending C++ support (2)

Reply via email to