Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

Martin Sebor via Gcc-patches Mon, 09 Nov 2020 08:00:34 -0800

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html


Jeff, I don't expect to have the cycles to reimplement this patch
using the Ranger APIs before stage 1 closes.  I'm open to giving
it a try in stage 3 if it's still in scope for GCC 11.  Otherwise,
is this patch okay to commit?

On 9/21/20 9:13 AM, Martin Sebor wrote:

On 9/20/20 12:39 AM, Aldy Hernandez wrote:
On 9/19/20 11:22 PM, Martin Sebor wrote:
On 9/18/20 12:29 AM, Aldy Hernandez wrote:
On 9/17/20 10:18 PM, Martin Sebor wrote:
On 9/17/20 12:39 PM, Andrew MacLeod wrote:
On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
On 9/16/20 9:23 PM, Jeff Law wrote:
On 9/15/20 1:47 PM, Martin Sebor wrote:
Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
can lead to a subsequent buffer overflow corrupting the heap or
stack.  The attached patch diagnoses a subset of these cases where
the overflow/wraparound is still detectable.

Besides regtesting GCC on x86_64-linux I also verified the warning
doesn't introduce any false positives into Glibc or Binutils/GDB
builds on the same target.

Martin

gcc-96838.diff
PR middle-end/96838 - missing warning on integer overflow incalls to allocation functions
gcc/ChangeLog:

    PR middle-end/96838
    * calls.c (eval_size_vflow): New function.
    (get_size_range): Call it.  Add argument.
(maybe_warn_alloc_args_overflow): Diagnoseoverflow/wraparound.
    * calls.h (get_size_range): Add argument.

gcc/testsuite/ChangeLog:

    PR middle-end/96838
    * gcc.dg/Walloc-size-larger-than-19.c: New test.
    * gcc.dg/Walloc-size-larger-than-20.c: New test.
If an attacker can control an integer overflow that feeds anallocation, then they can do all kinds of bad things. In fact,when my son was asking me attack vectors, this is one I said I'dlook at if I were a bad guy.
I'm a bit surprised you can't just query the range of theargument and get the overflow status out of that range, but Idon't see that in the APIs. How painful would it be to makethat part of the API? The conceptual model would be to just askfor the range of the argument to malloc which would include therange and a status bit indicating the computation might haveoverflowed.
Do we know if it did/would have wrapped? sure. since we have todo the math. so you are correct in that the information isthere. but is it useful?
We are in the very annoying habit of subtracting one by adding0xFFFFFFF. which means you get an overflow for unsigned when yousubtract one. From what I have seen of unsigned math, we wouldbe flagging very many operations as overflows, so you would stillhave the difficulty of figuring out whether its a "real" overflowor a fake one because of the way we do unsigned math
You and me both :)
At the very start, I did have an overflow flag in the rangeclass... but it was turning out to be fairly useless so it wasremoved.
.
I agree that being able to evaluate an expression in an as-if
infinite precision (in addition to its type) would be helpful.
SO again, we get back to adding 0x0fffff when we are trying tosubtract one... now, with infinite precision you are going to see
[2,10] - 1 we end up with [2,10]+0xFFFFFF, which will nowgive you [0x100000001, 0x100000009] so its going to look likeit overflowed?
But just to make sure I understood correctly, let me ask again
using an example:

  void* f (size_t n)
  {
    if (n < PTRDIFF_MAX / 2)
      n = PTRDIFF_MAX / 2;

    return malloc (n * sizeof (int));
  }

Can the unsigned wraparound in the argument be readily detected?

On trunk, this ends up with the following:

  # RANGE [4611686018427387903, 18446744073709551615]
  _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
  _1 = _6 * 4;
  ...
  p_5 = mallocD.1206 (_1); [tail call]
  ...
  return p_5;

so _1's range reflects the wraparound in size_t, but _6's range
has enough information to uncover it.  So detecting it is possible
and is done in the patch so we get a warning:
warning: argument 1 range [18446744073709551612,0x3fffffffffffffffc] is too large to represent in ‘long unsignedint’ [-Walloc-size-larger-than=]
    6 |   return malloc (n * sizeof (int));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
The code is very simplistic and only handles a small subset ofcases.It could be generalized and exposed by a more generic API but itdoesseem like the ranger must already have all the logic built intoit so
if it isn't exposed now it should be a matter of opening it up.
everything is exposed in range-ops.  well, mostly.
if we have _1 = _6 * 4
if one wanted to do that infinite precision, you query the rangefor _6, and the range for 4 (which would be [4,4] :-)
range_of_expr (op1r, _6, stmt)
range_of_expr (op2r, 4, stmt)
you could take their current types, and cast those ranges towhatever the next higher precsion is,
range_cast  (op1r, highertype)
range_cast (op2r, highertype)
then invoke the operation on those parameters

gimple_range_fold (r, stmt,  op1r, op2r)
and that will do your operation in the higher precision. youcould compare that to the value in regular precision too i suppose.
The patch does pretty much exactly what you described, except in
offset_int, and only for a limited set of arithmetic operations.
It figures out if an unsigned expression wrapped simply by checking
to see if the mathematically correct result fits in its type.

It sounds like I should be able to use the range APIs above instead
and get the same result, but for arbitrary expressions.  I don't
need to (and very well can't) compare the infinitely precise result
to the regular result because when it's a range that wraps the result
is the full range of the type (i.e., [0, SIZE_MAX] in the test above).
The ranger is designed to track ranges as they are represented inthe IL. You are asking for us to interpret the IL as somethingother than what is there, and increase the precision. Thats adifferent task.
could that be done? maybe. It might involve parallel trackingof ssa-Name ranges in a higher precision... and recalculatingevery expression using those values. Im not convinced that ageneralized ranger which works in higher precision is really goingto solve the problem the way you hope it would.. i think we'd geta lot of false info when unsigned values are involved
I don't think that's necessary for unsigned wrapping.  Most of
the time the (possibly) wrapped result is what we need because
that's what the languages say is supposed to happen.  The cases
when we need to check for the dangerous wraparound (or overflow)
should be rare (just allocation calls) and can be handled on
demand by walking the IL and doing the computation in the infinite
precision.

For signed overflow, though, computing the mathematically correct
result seems preferable.  Or at least tracking (and propagating)
the overflow bit should be.  That way we could tell when
an expression is definitely invalid.
If I were to see a clear solution/description as to what is aproblematic overflow and what isnt, there might be something moregeneral that could be done...
Does the distinction above (unsigned wrapping/signed overflow) help?
I mentioned it to Aldy in response to the irange best practices
document.  He says there's no way to do that and no plans to
make it possible.
Thats not quite what he said. . He said "If the operation maywrap or overflow, it will show up as so in the range. I don'tthink we have any plans of changing this behavior."
In fact. looking back, he said the same thing I just did....There IS a mechanism there for working in higher precision shouldone desire to do so, but as I pointed out above, Im not convincedas a general thing in the ranger it will work how you imagine.
My guess is what you really want is to be invoking range-ops onstmts you care about whether they might overflow or not withhigher precision versions and then identify the cases whichtrigger that you actually care about.
Yes, it looks like that's the way to go.  I was hoping the ranger
had an API that would do all that for me (i.e., walk the IL and
compute the range of the result on demand in the precision of
my choice), since it has to do that as it is, except that it uses
the precision of the expression.  But if not, it sounds like with
range_ops I should be able to fairly easily write one myself.
The range_ops class is too low-level for this. It folds pairs ofranges, and has no concept of gimple or tree expressions. To rollyour own solution you'd have to walk the IL, parse the gimplestatements and then call range-ops on your adjusted arguments. Abetter approach would be to overload range_of_stmt as Andrewsuggested (see below).
So maybe you want an "overflow calculator" which is similar to theranger, and identifies which kinds of stmts expect certain rangeof results, and if they get results outside of the expectedranges, triggers a flag.maybe it could convert all unsigned and signed values to signed,and then work in a higher precision, and then if any intermediateresult cant be accurately cast back to the original type, it getsflagged? is that the kind of overflow that is cared about?
There are two kinds of "overflow:" unsigned wrapping and signed
overflow.  Both are a problem in allocation calls like:

   int *p = malloc (nelts * size);

If nelts and size are signed, the compile-time result seems to
be done in saturation arithmetic (its range looks to be capped
at TYPE_MAX).  If they are unsigned, they wrap around zero.
Neither is usually intended by the user but the wraparound is
the more dangerous of the two because it's well-defined and
because it shrinks the size to a very small number.
I dont know... this isnt my space :-) Im just telling youthat I'm not sure the current range query mechanism is the rightplace to be asking if an "overflow" occurred because I think thegeneral answer wont be useful.. too many kinds of overflows throwntogether.
I think this requires more specifoc detailed information, andmaybe a simple overflow analyzer will do the trick based onrange-ops.... I think range-ops has the required abilities.whether it needs some enhancing, or something else exposed, I'mnot sure yet
Thank you!  This was a useful clarification to what Aldy said.
I'd been meaning to follow up with him on this point when I was
done with what I'm working on but Jeff's question just got me
the answers I was looking for quicker.  Let me experiment with
range_ops and get back to you.
Andrew
PS One could always take a ranger and overload therange_of_stmt() routine that calculates results, so that it callscurrent functionality, then try converting the arguements tohigher precision ranges and re-invoking range-ops on thosearguments, get a new higher precision range back and then look atthe results vs the normal ones. Might be a place to startexperimenting... then pass a flag along sayign an overflowoccurred.
That sounds like something to look into.  I don't see range_of_stmt
This would be my preferred approach. It's clean and builds on topof the current infrastructure without polluting other users of theranger. It definitely sounds like something worth pursuing.
 > on trunk.  Is that the right name or has it not been merged yet?
This is in the ranger proper which should come in, in the next weekor two (??). You can see it in our staging branch(users/aldyh/ranger-staging):
class gimple_ranger : public range_query
{
public:
gimple_ranger (bool use_loop_info) : m_use_loop_info(use_loop_info) { } virtual bool range_of_stmt (irange &r, gimple *, tree name =NULL) OVERRIDE; virtual bool range_of_expr (irange &r, tree name, gimple * =NULL) OVERRIDE;
   virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
   virtual void range_on_entry (irange &r, basic_block bb, tree name);
   virtual void range_on_exit (irange &r, basic_block bb, tree name);
...
...
};
The idea is that range_of_stmt() is called for all statements as theIL is walked backwards. You could just overload this, build athrowaway statement you could pass to gimple_ranger::range_of_stmt,and work from there.
Thanks for the tutorial!  I see the patch you posted with this API
so I'll give it a try once it's in.
Note that the patch I posted was to standardize valuation throughoutthe compiler. It's not the ranger per se, but the underlying API forquerying ranges that ranger, vr_values, etc will use. So you need towait for the full ranger to work on this.
I see.  In that case it might make sense to commit the patch as is
and switch it over to the ranger API when it's in.  Jeff, what's
your preference?

Martin

Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

Reply via email to