+/* Checks one MEM_REF in REF, located at LOCATION, for out-of-bounds
+   references to string constants.  If VRP can determine that the array
+   subscript is a constant, check if it is outside valid range.
+   If the array subscript is a RANGE, warn if it is non-overlapping
+   with valid range.
+   IGNORE_OFF_BY_ONE is true if the MEM_REF is inside an ADDR_EXPR.  */
This function doesn't have IGNORE_OFF_BY_ONE as a parameter.  Drop it
from the comment.

In the latest update (yet to be posted) the function uses it.

+  /* Determine the offsets and increment OFFRANGE for the bounds of each.  */
+  while (TREE_CODE (arg) == SSA_NAME)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (arg);
+      if (!is_gimple_assign (def))
+       {
+         if (tree var = SSA_NAME_VAR (arg))
+           arg = var;
+         break;
+       }
What's the point of looking at the underlying SSA_NAME_VAR here? I can't
see how that's ever helpful.  You'll always exit the loop at this point
which does something like

if (TREE_CODE (arg) == ADDR_EXPR)
  {
     do something interesting
  }
else
  return;

ISTM that any time you dig into SSA_NAME_VAR (arg) what you're going to
get back is some kind of _DECL node -- I'm not aware of a case where
you're going to get back an ADDR_EXPR.

I have removed the code.  It was a vestige of something that
didn't work out and I didn't notice.

+
+      tree_code code = gimple_assign_rhs_code (def);
+      if (code == POINTER_PLUS_EXPR)
+       {
+         arg = gimple_assign_rhs1 (def);
+         varoff = gimple_assign_rhs2 (def);
+       }
+      else if (code == ASSERT_EXPR)
+       {
+         arg = TREE_OPERAND (gimple_assign_rhs1 (def), 0);
+         continue;
+       }
+      else
+       return;
+
+      if (TREE_CODE (varoff) != SSA_NAME)
+       break;
+
+      vr = get_value_range (varoff);
+      if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
+       break;
Doesn't this let VR_ANTI_RANGE through?  Why not instead

  if (!vr || vr->type != VR_RANGE || !vr->min || !vr->max)

?  I'm having trouble convincing myself the subsequent code will DTRT
for an anti-range.

The anti-range code adds PTRDIFF_MIN and PTRDIFF_MAX to
the offset (that's what ARRBOUNDS is initially set to, until
we have found the array that's being dereferenced).

Because of the loop bailing for an anti-range could be too
early (the subsequent iterations might compensate for
the conservative anti-range handling and find a bug in
offsets added later).  It's unlikely but so are all bugs
so I try to handle even corner cases.

+
+      if (TREE_CODE (vr->min) != INTEGER_CST
+          || TREE_CODE (vr->max) != INTEGER_CST)
+        break;
+
+      offset_int ofr[] = {
+       wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
+       wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
+      };
+
+      if (vr->type == VR_RANGE)
+       {
+         if (ofr[0] < ofr[1])
+           {
+             offrange[0] += ofr[0];
+             offrange[1] += ofr[1];
+           }
+         else
+           {
+             offrange[0] += strbounds[0];
+             offrange[1] += strbounds[1];
+           }
When can the ELSE clause above happen for a VR_RANGE?

For a maximum in excess of PTRDIFF_MAX and a non-negative
minimum that's less than that.

+  /* At level 2 check also intermediate offsets.  */
+  int i = 0;
+  if (extrema[i] < -strbounds[1]
+      || extrema[i = 1] > strbounds[1] + eltsize)
+    {
+      HOST_WIDE_INT tmpidx = extrema[i].to_shwi () / eltsize.to_shwi ();
+
+      warning_at (location, OPT_Warray_bounds,
+                 "intermediate array offset %wi is outside array bounds "
+                 "of %qT",
+                 tmpidx,  strtype);
+      TREE_NO_WARNING (ref) = 1;
+    }
+}
This seems ill-advised.  All that matters is the actual index used in
the dereference.  Intermediate values (ie, address computations) along
the way are uninteresting -- we may form an address out of the bounds of
the array as an intermediate computation.  But the actual memory
reference will be within the range.

The idea is to help detect bugs that cannot be detected otherwise.
Take the example below:

  void g (int i)
  {
    if (i < 1 || 2 < i)
      i = 1;

    const char *p1 = "12" + i;
    const char *p2 = p1 + i;

    extern int x, y;

    x = p2[-4];   // #1: only valid if p2 is out of bounds
    y = p2[0];    // #2 therefore this must be out of bounds
  }

For the dereference at #1 to be valid i must be 2 (i.e.,
the upper bound of its range) and so p2 is therefore out
of bounds.

We may be able to compensate for it and compute the right
address at #1 but if the out-of-bounds value is used in
another dereference that makes a different assumption
(such as #2) one of the two is definitely wrong.  It might
be possible to do something smarter and determine if the
pointer really is used this way but I didn't want to
complicate the things too much so I put the logic under
level 2.

I will post an updated patch shortly.

Martin

Reply via email to