On Tue, 2021-03-02 at 07:09 +0000, brian.sobulefsky wrote:
> Hi,
> 
> It may not be worth altering at this point, but it seems like it
> would leave less
> bugs open if all the constraints go in as "closed" ranges and all
> evals are
> translated to closed intervals. So, if (idx > 0) and if (idx >= 1)
> are the same
> constraint. I know this won't be an option for eventual float
> support, but
> that is a different can of worms. For integers, you can fix it at
> those entry points
> and then all other subroutines can ignore the issue of open or closed
> ranges.
> 
> I fully understand the eye glaze and did not want to have to write it
> that
> way. I am thinking if there is a cleaner way to do it. Anyway, that
> is why I
> put a comment in each case to derive the result. This issue of "sides
> of the
> condition" and "inverted operator" as you call it in some places is a
> recurring
> theme. It is especially irritating when we lose commutativity, as we
> do with MINUS.
> 
> Adding logic in my subroutine for MULT or DIV is not hard, handling
> overflow
> is a bit more involved. At the very least, we would need to know what
> the max or min
> of a particular variable is, which might be in the type tree. We
> would also need to
> define how we want to handle the issue.
> 
> The problem is (and I have been thinking about this a lot in terms of
> constraint
> merging), there are currently no "or constraints," which would be
> helpful in merging too.
> So for overflow, when you have something like
> 
> if (idx > 0)
>  {
>   idx += num;
>   __analyzer_eval(idx > num);
>  }
> 
> you have gone from a single constraint (idx > 0), to an "or
> condition"
> (idx > num || idx < MIN_INT + num). The only solution now, other than
> ignoring
> overflow as a bug that is tolerated, is to just pass it off as
> unhandled (and
> therefore UNKNOWN). Perhaps you may want to add overflow as one of
> your analyzer
> warnings if it cannot be ruled out?
> 
> I did try to run a test with a simple || in the condition before just
> to see what
> would happen, and as you probably know it is not handled at all. I
> did not watch
> in gdb, but it is obvious from constraint-manager.cc that there is
> nothing to handle
> it. I think I actually did an __analyzer_eval() of the same ||
> condition verbatim
> that was in the if() conditional and still got an UNKNOWN.
> 
> It is a pretty intrusive change to add logic for that, which is why I
> have not
> done any work on it yet. Without the concept of "or" I don't see how
> we could
> handle overflow, but maybe you don't really want to handle it anyway,
> but
> rather just emit a warning if it might be considered poor practice to
> rely
> on something that is technically machine dependent anyway.

I'm not sure how we should handle this.

One approach would be to generalize the constraint-handling so that we
store a set of "facts" about svalues, where the facts are themselves
svalues that are known to be true.

Hence we could build an svalue for the TRUTH_OR_EXPR
  (idx > num || idx < MIN_INT + num)
and store that as a fact within the constraint manager.

But that would be a big rewrite.

Somehow the constraint manager needs to be able to evaluate queries
w.r.t known facts, and probably canonicalize sets of facts, and handle
mergers.

IIRC, the clang analyzer works by exposing an "add fact" interface
(where the facts can be compound symbolic expressions), but
implementing things internally with a choice of either a set of ranges,
or a Z3-backed solver.


If we're considering overflow, I would like to -fanalyzer to eventually
support bounds checking, and e.g. detecting attacks due to buffer size
overflows (CWE-131 due to CWE-190) e.g. in this code that probably
should have used calloc:

struct foo * make_arr (size_t n)
{
  size_t sz = sizeof (struct foo) * n;
  struct foo *p = (struct foo *)malloc (sz);
  if (!p)
     return;
  memset (p, 0, sz);
  return p;
}

void test (size_t n)
{
   struct foo *f = make_arr (n);
   if (!f)
     return;
   for (i = 0; i < n; i++)
     {
       //... do stuff with f[i]
     }
}

it would be good to detect the case when sz overflows and thus the
array is smaller than expected.  I think this could work be recording
that the allocated size of *p (== *f) is
   (size_t)(sizeof (struct foo) * n)

In the loop, the access to f[i] could bifurcate the egraph into:

  outcome A: (sizeof (struct foo) * i) < allocated_size
    (carry on, and have this recorded so no further checking needed on
this path)

  outcome B: (sizeof (struct foo) * i) >= allocated_size
    (complain about buffer overflow and stop this path)

Outcome B thus occurs when:
  (sizeof (struct foo) * i) >= (size_t)(sizeof (struct foo) * n)
  && (i < n)
so say sizeof (struct foo) == 16
  (16 * i) <= (size_t) (16 * n)
  && (i < n)
and we could then (somehow) show that this can happen e.g. for
  n > (SIZE_MAX / 16).
(and I've probably messed up at least some of the logic in the above)

Obviously this would be huge scope creep.

I wonder if it's worth simply fixing the increment/decrement case for
now (though I think we're still affected by overflow, maybe punting on
that), and saving the more complicated cases for a rewrite that can
handle more general boolean symbolic expressions.

Thoughts?
Dave

Reply via email to