Re: [PATCH 2/2] analyzer: out-of-bounds checker [PR106000]

2022-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote:
> This patch adds an experimental out-of-bounds checker to the
> analyzer.
> 
> The checker was tested on coreutils, curl, httpd and openssh. It is
> mostly
> accurate but does produce false-positives on yacc-generated files and
> sometimes when the analyzer misses an invariant. These cases will be
> documented in bugzilla.
> (Regrtests still running with the latest changes, will report back
> later.)

Hi Tim, thanks for the patch, and for all the testing you've done on
it.

We've already had several rounds of review of this off-list, and this
patch looks very close to ready.

Some nits below...

> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 5021376b6fb..8e73af60ceb 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -158,6 +158,10 @@ Wanalyzer-tainted-size
>  Common Var(warn_analyzer_tainted_size) Init(1) Warning
>  Warn about code paths in which an unsanitized value is used as a
> size.
>  
> +Wanalyzer-out-of-bounds
> +Common Var(warn_analyzer_out_of_bounds) Init(1) Warning
> +Warn about code paths in which a write or read to a buffer is out-
> of-bounds.
> +

Please keep the list alphabetized; I think this needs to be between
  Wanalyzer-mismatching-deallocation 
and 
  Wanalyzer-possible-null-argument

>  Wanalyzer-use-after-free
>  Common Var(warn_analyzer_use_after_free) Init(1) Warning
>  Warn about code paths in which a freed value is used.
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index f7df2fca245..2f9382ed96c 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -1268,6 +1268,402 @@ region_model::on_stmt_pre (const gimple
> *stmt,
>  }
>  }
>  
> +/* Abstract base class for all out-of-bounds warnings.  */
> +
> +class out_of_bounds : public
> pending_diagnostic_subclass
> +{
> +public:
> +  out_of_bounds (const region *reg, tree diag_arg, byte_range range)
> +  : m_reg (reg), m_diag_arg (diag_arg), m_range (range)
> +  {}
> +
> +  const char *get_kind () const final override
> +  {
> +    return "out_of_bounds_diagnostic";
> +  }
> +
> +  bool operator== (const out_of_bounds &other) const
> +  {
> +    return m_reg == other.m_reg
> +  && m_range == other.m_range
> +  && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg);
> +  }
> +
> +  int get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_out_of_bounds;
> +  }
> +
> +  void mark_interesting_stuff (interesting_t *interest) final
> override
> +  {
> +    interest->add_region_creation (m_reg);
> +  }
> +
> +protected:
> +  const region *m_reg;
> +  tree m_diag_arg;
> +  byte_range m_range;

Please add a comment clarifying what the meaning of m_range is here. 
Is it
(a) the range of all bytes that are accessed,
(b) the range of bytes that are accessed out-of-bounds,
(c) etc?

>From my reading of the patch I think it's (b).


> +};
> +
> +/* Abstract subclass to complaing about out-of-bounds
> +   past the end of the buffer.  */
> +
> +class past_the_end : public out_of_bounds
> +{
> +public:
> +  past_the_end (const region *reg, tree diag_arg, byte_range range,
> +   tree byte_bound)
> +  : out_of_bounds (reg, diag_arg, range), m_byte_bound (byte_bound)
> +  {}
> +
> +  bool operator== (const past_the_end &other) const
> +  {
> +    return m_reg == other.m_reg
> +  && m_range == other.m_range
> +  && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg)

Is it possible to call
  out_of_bounds::operator== 
for the first three fields, rather than a copy-and-paste of the logic?

> +  && pending_diagnostic::same_tree_p (m_byte_bound,
> +  other.m_byte_bound);
> +  }
> +
> +  label_text
> +  describe_region_creation_event (const evdesc::region_creation &ev)
> final
> +  override
> +  {
> +    if (m_byte_bound && TREE_CODE (m_byte_bound) == INTEGER_CST)
> +  return ev.formatted_print ("capacity is %E bytes",
> m_byte_bound);
> +
> +    return label_text ();
> +  }
> +
> +protected:
> +  tree m_byte_bound;
> +};

[...snip the concrete subclasses...]

We went through several rounds of review off-list, and I have lots of
ideas for wording tweaks to the patch, but rather than me be a
"backseat driver" (or bikeshedding), I think that that aspect of the
patch is good enough as-is, and I'll make the wording changes myself
once the patch is in trunk.


[...snip...]

> +
> +    if (warned)
> +  {
> +   char num_bytes_past_buf[WIDE_INT_PRINT_BUFFER_SIZE];
> +   print_dec (m_range.m_size_in_bytes, num_bytes_past_buf,
> UNSIGNED);

I think we can use %wu for this, but I can fix this up in a followup.


[...snip...]

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index fa23fbe..5ab834af780 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -459,6 +459,7 @@ Objective-C and Objective-C++ 

[PATCH 2/2] analyzer: out-of-bounds checker [PR106000]

2022-08-09 Thread Tim Lange
This patch adds an experimental out-of-bounds checker to the analyzer.

The checker was tested on coreutils, curl, httpd and openssh. It is mostly
accurate but does produce false-positives on yacc-generated files and
sometimes when the analyzer misses an invariant. These cases will be
documented in bugzilla.
(Regrtests still running with the latest changes, will report back later.)

2022-08-09  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106000
* analyzer.opt: Add Wanalyzer-out-of-bounds.
* region-model.cc (class out_of_bounds): Diagnostics base class
for all out-of-bounds diagnostics.
(class past_the_end): Base class derived from out_of_bounds for
the buffer_overflow and buffer_overread diagnostics.
(class buffer_overflow): Buffer overflow diagnostics.
(class buffer_overread): Buffer overread diagnostics.
(class buffer_underflow): Buffer underflow diagnostics.
(class buffer_underread): Buffer overread diagnostics.
(region_model::check_region_bounds): New function to check region
bounds for out-of-bounds accesses.
(region_model::check_region_access):
Add call to check_region_bounds.
(region_model::get_representative_tree): New function that accepts
a region instead of an svalue.
* region-model.h (class region_model):
Add region_model::check_region_bounds.
* region.cc (region::symbolic_p): New predicate. 
(offset_region::get_byte_size_sval): Only return the remaining
byte size on offset_regions.
* region.h: Add region::symbolic_p.
* store.cc (byte_range::intersects_p):
Add new function equivalent to bit_range::intersects_p.
(byte_range::exceeds_p): New function.
(byte_range::falls_short_of_p): New function.
* store.h (struct byte_range): Add byte_range::intersects_p,
byte_range::exceeds_p and byte_range::falls_short_of_p.

gcc/ChangeLog:

PR analyzer/106000
* doc/invoke.texi: Add Wanalyzer-out-of-bounds.

gcc/testsuite/ChangeLog:

PR analyzer/106000
* gcc.dg/analyzer/allocation-size-3.c:
Disable out-of-bounds warning.
* gcc.dg/analyzer/memcpy-2.c: Disable out-of-bounds warning.
* gcc.dg/analyzer/pr101962.c: Add dg-warning.
* gcc.dg/analyzer/pr97029.c:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/test-setjmp.h:
Add dummy buffer to prevent an out-of-bounds warning.
* gcc.dg/analyzer/zlib-3.c: Add dg-bogus.
* gcc.dg/analyzer/out-of-bounds-1.c: New test.
* gcc.dg/analyzer/out-of-bounds-2.c: New test.
* gcc.dg/analyzer/out-of-bounds-3.c: New test.
* gcc.dg/analyzer/out-of-bounds-container_of.c: New test.
* gcc.dg/analyzer/out-of-bounds-coreutils.c: New test.
* gcc.dg/analyzer/out-of-bounds-curl.c: New test.

---
 gcc/analyzer/analyzer.opt |   4 +
 gcc/analyzer/region-model.cc  | 410 ++
 gcc/analyzer/region-model.h   |   3 +
 gcc/analyzer/region.cc|  32 ++
 gcc/analyzer/region.h |   4 +
 gcc/analyzer/store.cc |  67 +++
 gcc/analyzer/store.h  |   9 +
 gcc/doc/invoke.texi   |  12 +
 .../gcc.dg/analyzer/allocation-size-3.c   |   2 +
 gcc/testsuite/gcc.dg/analyzer/memcpy-2.c  |   2 +-
 .../gcc.dg/analyzer/out-of-bounds-1.c | 119 +
 .../gcc.dg/analyzer/out-of-bounds-2.c |  83 
 .../gcc.dg/analyzer/out-of-bounds-3.c |  91 
 .../analyzer/out-of-bounds-container_of.c |  51 +++
 .../gcc.dg/analyzer/out-of-bounds-coreutils.c |  29 ++
 .../gcc.dg/analyzer/out-of-bounds-curl.c  |  41 ++
 gcc/testsuite/gcc.dg/analyzer/pr101962.c  |   5 +-
 gcc/testsuite/gcc.dg/analyzer/pr97029.c   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/test-setjmp.h   |   4 +-
 gcc/testsuite/gcc.dg/analyzer/zlib-3.c|   4 +-
 20 files changed, 970 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-container_of.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-coreutils.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-curl.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 5021376b6fb..8e73af60ceb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -158,6 +158,10 @@ Wanalyzer-tainted-size
 Common Var(warn_analyzer_tainted_size) Init(1) Warning
 Warn about code paths in which an unsanitized value is used as a size.
 
+Wanalyzer-out-of-bounds
+Common Var(warn_analyzer_out_of_bounds) Init(1) Wa