Re: GCC optimizes integer overflow: bug or feature?

Michael Veksler Thu, 21 Dec 2006 07:32:24 -0800

Paolo Bonzini wrote:

Some time ago (a year?) I was told on this mailing-list that codebreakage due to undefinedness of signed overflow is not too common (Iat least claimed with no evidence that it was more than one bug per1,000 lines). My claim was counterclaimed by something like "most ofthe time people work with small enough values, so overflows don'thappen too many times in practice".
This claim is a non sequitur.
What I would say is, people *don't understand why a compiler needs toassume undefined overflow semantics*, because people work with smallvalues and don't care about the boundary conditions.

But that is generally true for stuff like gets/sprintf/strcpy/unsafeoperation de-jour . People don't care for overflow cases of sprintfbecause they work with small data. Yet when, suddenly, the size of thedata grows and funny stuff start to happen *due* to their sprintf et. al.

For example, most programmers that know assembly language willappreciate if the compiler can use the processor's support for loopwith a known number of iterations (mtctr on PPC, for example).However, I'm pretty sure that, if you present these two pieces of codeto some good programmers,
  unsigned int i;
  for (i = 0; i <= n; i++)
    ...

  unsigned int i;
  for (i = 0; i < n; i++)
    ...
where the compiler uses mtctr only in the second case, most of themwill think that the compiler has bug. Almost nobody will realize thatthe first can loop infinitely, and the second cannot (which is thereason why the compiler cannot optimize them in the same way).

Unfortunately, some compilers (not gcc) would optimize both cases thesame way.

Well, these programmers *are* assuming undefined overflow semanticseven on unsigned types. Maybe they would like overflow semanticsshould be defined in some cases and undefined in others? Fine by me,that would be -fwrapv -funsafe-loop-optimizations in GCC; but alanguage standard cannot go to such detail!

It could add the type modifier "nowrap", such that loop indexes can bemarked this way explicitly, signed or unsigned.

On the autoconf mailing list, Paul Eggert mentioned as a goodcompromise that GCC could treat signed overflow as undefined only forloops and not in general. Except that the original gnulib bug reportwas in a loop, so this compromise would leave that case undefined.

I'd vote for Paul's suggestion. It would have the least-surprise effect.

You may think that the analogy is far fetched? In that case, I'llpick some gcc source file, at random and look for signed operationsin it:
categorize_ctor_elements_1(......) in gcc/expr.c:
_elts += mult * nz;
elt_count += mult * ic;
Both assume that neither the multiplication, nor the addition overflow.
I see this as a *counterexample*: it shows that programmers don't careabout having wrapping overflow, in fact they don't care about overflowat all. This code is incorrect not only if overflow is undefined, butalso if overflow wraps (-fwrapv); it is correct if overflow aborts(-ftrapv).

No, it is an example for a bug which is difficult to protect againstwith an assertion. Even more seriously, instead of a simple mere bugwith bad behavior we get escalated to undefined behavior. "Undefined" isthe worst kind since in theory it is allowed to clean-up your disk, andexplode your computer and start WW-III. Can this happen in this case? itdepends what the caller does and if gcc can see the caller and thecallee at the same time (e.g. if gcc knows that a caller causesmult=MAX_INT/2, it might assume that nz <=2 for that caller, and performsome nasty transformations down the line).

I am not saying that GCC should abandon all optimizations that assumethat no execution path gets to undefined cases. I am saying that thesethings should not be taken lightly. Saying "programmers don't care abouthaving ****[something leading to undefined behavior]" is simply ignoringthe graveness of the bad effects. By taking seriously I include the VRPthat due to its lacking data structure (at least it used to be so) wouldbe much less effective if gcc would to assume wrapping int.

Instead of having to choose between the bad alternatives of VRP thatgives weird results for undefined cases or does barely anything, itsdata structure should be improved, and each variable should have a setof ranges instead of a single range (like what I have seen in someConstraints/CP papers).


------------

The second point in this example is:

Checking for buffer overflows (undefined code) before they occur istrivial in most cases. Checking for a wrapping "signed int" before itactually wraps is relatively complex and unintuitive (e.g. it isunintuitive why should a/b ever overflow, hint: MIN_INT/-1 --> MIN_INT).I would prefer plain bugs, in which -g and -O3 act just the same, overundefined behavior when I have no choice but to debug a -O3 code !!!I can't forget the day (several years ago) when I had to debug myMyAllocator<T> class inside


   std::set<int,std::less<int>, MyAllocator<int> >::insert(int);

which would crash only with -O2 on PPC, after some sequence of insertsand erases (a dumb strict-aliasing bug, which was tough to hunt down).It was not pretty. This bug should be handed to all those who would liketo expand the meaning of "undefined". They should try to debug it, andthen say if they would still want to add more "undefined" clauses to thestandard. The new gcc warnings could have helped me in this case.



 Michael

Re: GCC optimizes integer overflow: bug or feature?

Reply via email to