On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote:
> This patch adds an experimental out-of-bounds checker to the
> analyzer.
>
> The checker was tested on coreutils, curl, httpd and openssh. It is
> mostly
> accurate but does produce false-positives on yacc-generated files and
> sometimes when the analyzer misses an invariant. These cases will be
> documented in bugzilla.
> (Regrtests still running with the latest changes, will report back
> later.)
Hi Tim, thanks for the patch, and for all the testing you've done on
it.
We've already had several rounds of review of this off-list, and this
patch looks very close to ready.
Some nits below...
> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 5021376b6fb..8e73af60ceb 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -158,6 +158,10 @@ Wanalyzer-tainted-size
> Common Var(warn_analyzer_tainted_size) Init(1) Warning
> Warn about code paths in which an unsanitized value is used as a
> size.
>
> +Wanalyzer-out-of-bounds
> +Common Var(warn_analyzer_out_of_bounds) Init(1) Warning
> +Warn about code paths in which a write or read to a buffer is out-
> of-bounds.
> +
Please keep the list alphabetized; I think this needs to be between
Wanalyzer-mismatching-deallocation
and
Wanalyzer-possible-null-argument
> Wanalyzer-use-after-free
> Common Var(warn_analyzer_use_after_free) Init(1) Warning
> Warn about code paths in which a freed value is used.
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index f7df2fca245..2f9382ed96c 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -1268,6 +1268,402 @@ region_model::on_stmt_pre (const gimple
> *stmt,
> }
> }
>
> +/* Abstract base class for all out-of-bounds warnings. */
> +
> +class out_of_bounds : public
> pending_diagnostic_subclass
> +{
> +public:
> + out_of_bounds (const region *reg, tree diag_arg, byte_range range)
> + : m_reg (reg), m_diag_arg (diag_arg), m_range (range)
> + {}
> +
> + const char *get_kind () const final override
> + {
> + return "out_of_bounds_diagnostic";
> + }
> +
> + bool operator== (const out_of_bounds &other) const
> + {
> + return m_reg == other.m_reg
> + && m_range == other.m_range
> + && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg);
> + }
> +
> + int get_controlling_option () const final override
> + {
> + return OPT_Wanalyzer_out_of_bounds;
> + }
> +
> + void mark_interesting_stuff (interesting_t *interest) final
> override
> + {
> + interest->add_region_creation (m_reg);
> + }
> +
> +protected:
> + const region *m_reg;
> + tree m_diag_arg;
> + byte_range m_range;
Please add a comment clarifying what the meaning of m_range is here.
Is it
(a) the range of all bytes that are accessed,
(b) the range of bytes that are accessed out-of-bounds,
(c) etc?
>From my reading of the patch I think it's (b).
> +};
> +
> +/* Abstract subclass to complaing about out-of-bounds
> + past the end of the buffer. */
> +
> +class past_the_end : public out_of_bounds
> +{
> +public:
> + past_the_end (const region *reg, tree diag_arg, byte_range range,
> + tree byte_bound)
> + : out_of_bounds (reg, diag_arg, range), m_byte_bound (byte_bound)
> + {}
> +
> + bool operator== (const past_the_end &other) const
> + {
> + return m_reg == other.m_reg
> + && m_range == other.m_range
> + && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg)
Is it possible to call
out_of_bounds::operator==
for the first three fields, rather than a copy-and-paste of the logic?
> + && pending_diagnostic::same_tree_p (m_byte_bound,
> + other.m_byte_bound);
> + }
> +
> + label_text
> + describe_region_creation_event (const evdesc::region_creation &ev)
> final
> + override
> + {
> + if (m_byte_bound && TREE_CODE (m_byte_bound) == INTEGER_CST)
> + return ev.formatted_print ("capacity is %E bytes",
> m_byte_bound);
> +
> + return label_text ();
> + }
> +
> +protected:
> + tree m_byte_bound;
> +};
[...snip the concrete subclasses...]
We went through several rounds of review off-list, and I have lots of
ideas for wording tweaks to the patch, but rather than me be a
"backseat driver" (or bikeshedding), I think that that aspect of the
patch is good enough as-is, and I'll make the wording changes myself
once the patch is in trunk.
[...snip...]
> +
> + if (warned)
> + {
> + char num_bytes_past_buf[WIDE_INT_PRINT_BUFFER_SIZE];
> + print_dec (m_range.m_size_in_bytes, num_bytes_past_buf,
> UNSIGNED);
I think we can use %wu for this, but I can fix this up in a followup.
[...snip...]
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index fa23fbe..5ab834af780 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -459,6 +459,7 @@ Objective-C and Objective-C++