Re: [PATCH 2/2] analyzer: strcpy and strncpy semantics

2022-09-02 Thread David Malcolm via Gcc-patches
On Fri, 2022-09-02 at 16:08 +0200, Tim Lange wrote:
> Hi,
> 
> below is my patch for the strcpy and strncpy semantics inside the
> analyzer, enabling the out-of-bounds checker to also complain about
> overflows caused by those two functions.
> 
> As the plan is to reason about the inequality of symbolic values in
> the
> future, I decided to use eval_condition to compare the number of
> bytes and
> the string size for strncpy [0].
> 
> - Tim
> 
> [0] instead of only trying to handle cases where svalues are
> constant;
>     which was how I did it in an earlier draft discussed off-list.
> 
> 
> This patch adds modelling for the semantics of strcpy and strncpy in
> the
> simple case where the analyzer is able to reason about the inequality
> of
> the size argument and the string size.
> 
> Regrtested on Linux x86_64.

Thanks for the patch.

The strcpy part looks great, but strncpy has some tricky behavior that
isn't fully modeled by the patch; see:

https://en.cppreference.com/w/c/string/byte/strncpy

For example: if the src string with null terminator is shorter than
"count", then dest is padded with additional null characters up to
"count".

You could split out the strcpy part of the patch, as that seems to be
ready to go as-is, and do the strncpy part as a followup if you like;
some further notes below...

[...snip...]
> 
> +/* Handle the on_call_pre part of "strncpy" and
> "__builtin_strncpy_chk".  */
>  
> -  /* For now, just mark region's contents as unknown.  */
> -  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
> +void
> +region_model::impl_call_strncpy (const call_details )
> +{
> +  const svalue *dest_sval = cd.get_arg_svalue (0);
> +  const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree
> (0),
> +    cd.get_ctxt ());
> +  const svalue *src_sval = cd.get_arg_svalue (1);
> +  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree
> (1),
> +   cd.get_ctxt ());
> +  const svalue *src_contents_sval = get_store_value (src_reg,
> +    cd.get_ctxt ());
> +  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
> +
> +  cd.maybe_set_lhs (dest_sval);
> +
> +  const svalue *string_size_sval = get_string_size (src_reg);
> +  if (string_size_sval->get_kind () == SK_UNKNOWN)
> +    string_size_sval = get_string_size (src_contents_sval);
> +
> +  /* strncpy copies until a zero terminator is reached or n bytes
> were copied.
> + Determine the lesser of both here.  */
> +  tristate ts = eval_condition (string_size_sval, LT_EXPR,
> num_bytes_sval);
> +  const svalue *copied_bytes_sval;
> +  switch (ts.get_value ())
> +    {
> +  case tristate::TS_TRUE:
> +   copied_bytes_sval = string_size_sval;

This is the
  strlen(src) + 1 < count
case, and thus we want to do a zero-fill of size:
   bin_op(num_bytes_sval, MINUS_EXPR, copied_bytes_sval)
in the relevant subregion of dest_reg.
Or perhaps this could be expressed by first doing a full zero-fill of
the num_bytes_sval-sized region of dest_reg, and then copying the src
contents.

> +   break;
> +  case tristate::TS_FALSE:
> +   copied_bytes_sval = num_bytes_sval;
> +   break;
> +  case tristate::TS_UNKNOWN:
> +   copied_bytes_sval
> + = m_mgr->get_or_create_unknown_svalue (size_type_node);
> +   break;
> +  default:
> +   gcc_unreachable ();
> +    }
> +
> +  const region *sized_dest_reg = m_mgr->get_sized_region (dest_reg,
> NULL_TREE,
> +
> copied_bytes_sval);
> +  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
>  }
> 

[...snip...]
> 

> b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
> new file mode 100644
> index 000..382f0fb5ef4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
> @@ -0,0 +1,122 @@
> +/* { dg-additional-options "-Wno-stringop-overflow -Wno-stringop-
> truncation" } */
> +#include 
> +#include 
> +#include 
> +
> +/* Wanalyzer-out-of-bounds tests for str(n)py-related overflows.
> +  
> +   The intra-procedural tests are all catched by Wstringop-overflow.

Nit: "catched" -> "caught".

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> new file mode 100644
> index 000..ea051eb761a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> @@ -0,0 +1,23 @@
> +#include 
> +#include "analyzer-decls.h"
> +
> +void test_1 (void)
> +{
> +  char str[] = "Hello";
> +  char buf[6];
> +  char *result = strncpy (buf, str, 6);
> +  __analyzer_describe (1, result); /* { dg-warning
> "region_svalue.*?'buf'" } */
> +  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
> 

[PATCH 2/2] analyzer: strcpy and strncpy semantics

2022-09-02 Thread Tim Lange
Hi,

below is my patch for the strcpy and strncpy semantics inside the
analyzer, enabling the out-of-bounds checker to also complain about
overflows caused by those two functions.

As the plan is to reason about the inequality of symbolic values in the
future, I decided to use eval_condition to compare the number of bytes and
the string size for strncpy [0].

- Tim

[0] instead of only trying to handle cases where svalues are constant;
which was how I did it in an earlier draft discussed off-list.


This patch adds modelling for the semantics of strcpy and strncpy in the
simple case where the analyzer is able to reason about the inequality of
the size argument and the string size.

Regrtested on Linux x86_64.

2022-09-02  Tim Lange  

gcc/analyzer/ChangeLog:

* region-model-impl-calls.cc (region_model::impl_call_strncpy):
New function.
* region-model.cc (region_model::on_call_pre):
Add call to impl_call_strncpy.
(region_model::get_string_size): New function.
* region-model.h (class region_model):
Add impl_call_strncpy and get_string_size.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/out-of-bounds-4.c: New test.
* gcc.dg/analyzer/strcpy-3.c: New test.
* gcc.dg/analyzer/strncpy-1.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc   |  62 -
 gcc/analyzer/region-model.cc  |  33 +
 gcc/analyzer/region-model.h   |   4 +
 .../gcc.dg/analyzer/out-of-bounds-4.c | 122 ++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c  |  23 
 gcc/testsuite/gcc.dg/analyzer/strncpy-1.c |  23 
 6 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strncpy-1.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..9f1ae020f4f 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1019,13 +1019,69 @@ region_model::impl_call_strcpy (const call_details )
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+   cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+cd.get_ctxt ());
 
   cd.maybe_set_lhs (dest_sval);
 
-  check_region_for_write (dest_reg, cd.get_ctxt ());
+  /* Try to get the string size if SRC_REG is a string_region.  */
+  const svalue *copied_bytes_sval = get_string_size (src_reg);
+  /* Otherwise, check if the contents of SRC_REG is a string.  */
+  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
+copied_bytes_sval = get_string_size (src_contents_sval);
+
+  const region *sized_dest_reg
+= m_mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+}
+
+/* Handle the on_call_pre part of "strncpy" and "__builtin_strncpy_chk".  */
 
-  /* For now, just mark region's contents as unknown.  */
-  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
+void
+region_model::impl_call_strncpy (const call_details )
+{
+  const svalue *dest_sval = cd.get_arg_svalue (0);
+  const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
+cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+   cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+cd.get_ctxt ());
+  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
+
+  cd.maybe_set_lhs (dest_sval);
+
+  const svalue *string_size_sval = get_string_size (src_reg);
+  if (string_size_sval->get_kind () == SK_UNKNOWN)
+string_size_sval = get_string_size (src_contents_sval);
+
+  /* strncpy copies until a zero terminator is reached or n bytes were copied.
+ Determine the lesser of both here.  */
+  tristate ts = eval_condition (string_size_sval, LT_EXPR, num_bytes_sval);
+  const svalue *copied_bytes_sval;
+  switch (ts.get_value ())
+{
+  case tristate::TS_TRUE:
+   copied_bytes_sval = string_size_sval;
+   break;
+  case tristate::TS_FALSE:
+   copied_bytes_sval = num_bytes_sval;
+   break;
+  case tristate::TS_UNKNOWN:
+   copied_bytes_sval
+ = m_mgr->get_or_create_unknown_svalue (size_type_node);
+   break;
+  default:
+   gcc_unreachable ();
+}
+
+  const region