Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

Andrew MacLeod via Gcc-patches Thu, 17 Sep 2020 11:40:26 -0700

On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:

On 9/16/20 9:23 PM, Jeff Law wrote:
On 9/15/20 1:47 PM, Martin Sebor wrote:
Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
can lead to a subsequent buffer overflow corrupting the heap or
stack.  The attached patch diagnoses a subset of these cases where
the overflow/wraparound is still detectable.

Besides regtesting GCC on x86_64-linux I also verified the warning
doesn't introduce any false positives into Glibc or Binutils/GDB
builds on the same target.

Martin

gcc-96838.diff
PR middle-end/96838 - missing warning on integer overflow in callsto allocation functions
gcc/ChangeLog:

    PR middle-end/96838
    * calls.c (eval_size_vflow): New function.
    (get_size_range): Call it.  Add argument.
    (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
    * calls.h (get_size_range): Add argument.

gcc/testsuite/ChangeLog:

    PR middle-end/96838
    * gcc.dg/Walloc-size-larger-than-19.c: New test.
    * gcc.dg/Walloc-size-larger-than-20.c: New test.
If an attacker can control an integer overflow that feeds anallocation, then they can do all kinds of bad things. In fact, whenmy son was asking me attack vectors, this is one I said I'd look atif I were a bad guy.
I'm a bit surprised you can't just query the range of the argumentand get the overflow status out of that range, but I don't see thatin the APIs. How painful would it be to make that part of the API? The conceptual model would be to just ask for the range of theargument to malloc which would include the range and a status bitindicating the computation might have overflowed.

Do we know if it did/would have wrapped? sure. since we have to dothe math. so you are correct in that the information is there. but isit useful?

We are in the very annoying habit of subtracting one by adding0xFFFFFFF. which means you get an overflow for unsigned when yousubtract one. From what I have seen of unsigned math, we would beflagging very many operations as overflows, so you would still have thedifficulty of figuring out whether its a "real" overflow or a fake onebecause of the way we do unsigned math

At the very start, I did have an overflow flag in the range class... but it was turning out to be fairly useless so it was removed.

I agree that being able to evaluate an expression in an as-if
infinite precision (in addition to its type) would be helpful.

SO again, we get back to adding 0x0fffff when we are trying to subtractone... now, with infinite precision you are going to see

[2,10] - 1 we end up with [2,10]+0xFFFFFF, which will now giveyou [0x100000001, 0x100000009] so its going to look like it overflowed?

But just to make sure I understood correctly, let me ask again
using an example:

  void* f (size_t n)
  {
    if (n < PTRDIFF_MAX / 2)
      n = PTRDIFF_MAX / 2;

    return malloc (n * sizeof (int));
  }

Can the unsigned wraparound in the argument be readily detected?

On trunk, this ends up with the following:

  # RANGE [4611686018427387903, 18446744073709551615]
  _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
  _1 = _6 * 4;
  ...
  p_5 = mallocD.1206 (_1); [tail call]
  ...
  return p_5;

so _1's range reflects the wraparound in size_t, but _6's range
has enough information to uncover it.  So detecting it is possible
and is done in the patch so we get a warning:

warning: argument 1 range [18446744073709551612, 0x3fffffffffffffffc]is too large to represent in ‘long unsigned int’[-Walloc-size-larger-than=]

    6 |   return malloc (n * sizeof (int));
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~

The code is very simplistic and only handles a small subset of cases.
It could be generalized and exposed by a more generic API but it does
seem like the ranger must already have all the logic built into it so
if it isn't exposed now it should be a matter of opening it up.


everything is exposed in range-ops.  well, mostly.
if we have _1 = _6 * 4

if one wanted to do that infinite precision, you query the range for _6,and the range for 4 (which would be [4,4] :-)

range_of_expr (op1r, _6, stmt)
range_of_expr (op2r, 4, stmt)

you could take their current types, and cast those ranges to whateverthe next higher precsion is,

range_cast  (op1r, highertype)
range_cast (op2r, highertype)
then invoke the operation on those parameters

gimple_range_fold (r, stmt,  op1r, op2r)

and that will do your operation in the higher precision. you couldcompare that to the value in regular precision too i suppose.

The ranger is designed to track ranges as they are represented in theIL. You are asking for us to interpret the IL as something other thanwhat is there, and increase the precision. Thats a different task.

could that be done? maybe. It might involve parallel tracking ofssa-Name ranges in a higher precision... and recalculating everyexpression using those values. Im not convinced that a generalizedranger which works in higher precision is really going to solve theproblem the way you hope it would.. i think we'd get a lot of false infowhen unsigned values are involved

If I were to see a clear solution/description as to what is aproblematic overflow and what isnt, there might be something moregeneral that could be done...

I mentioned it to Aldy in response to the irange best practices
document.  He says there's no way to do that and no plans to
make it possible.

Thats not quite what he said. . He said "If the operation may wrap oroverflow, it will show up as so in the range. I don't think we have anyplans of changing this behavior."

In fact. looking back, he said the same thing I just did.... There IS amechanism there for working in higher precision should one desire to doso, but as I pointed out above, Im not convinced as a general thing inthe ranger it will work how you imagine.

My guess is what you really want is to be invoking range-ops on stmtsyou care about whether they might overflow or not with higher precisionversions and then identify the cases which trigger that you actuallycare about.

So maybe you want an "overflow calculator" which is similar to theranger, and identifies which kinds of stmts expect certain range ofresults, and if they get results outside of the expected ranges,triggers a flag.maybe it could convert all unsigned and signed values to signed, andthen work in a higher precision, and then if any intermediate resultcant be accurately cast back to the original type, it gets flagged? isthat the kind of overflow that is cared about?

I dont know... this isnt my space :-) Im just telling you that I'mnot sure the current range query mechanism is the right place to beasking if an "overflow" occurred because I think the general answer wontbe useful.. too many kinds of overflows thrown together.

I think this requires more specifoc detailed information, and maybe asimple overflow analyzer will do the trick based on range-ops.... Ithink range-ops has the required abilities. whether it needs someenhancing, or something else exposed, I'm not sure yet



Andrew

PS One could always take a ranger and overload the range_of_stmt()routine that calculates results, so that it calls current functionality,then try converting the arguements to higher precision ranges andre-invoking range-ops on those arguments, get a new higher precisionrange back and then look at the results vs the normal ones. Might be aplace to start experimenting... then pass a flag along sayign anoverflow occurred.

Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

Reply via email to