On Wed, Jun 18, 2014 at 8:20 PM, Daniel Micay <[email protected]> wrote:
> On 18/06/14 01:08 PM, Gábor Lehel wrote: > > # Exposition > > > > We've debated the subject of integer overflow quite a bit, without much > > apparent progress. Essentially, we've been running in circles around two > > core facts: wrapping is bad, but checking is slow. The current consensus > > seems to be to, albeit grudgingly, stick with the status quo. > > > > I think we've established that a perfect, one-size-fits-all solution is > > not possible. But I don't think this means that we're out of options, or > > have no room for improvement. I think there are several imperfect, > > partial solutions we could pursue, to address the various use cases in a > > divide-and-conquer fashion. > > > > This is not a formal RFC, more of a grab bag of thoughts and ideas. > > > > The central consideration has to be the observation that, while wrapping > > around on overflow is well-supported by hardware, for the large majority > > of programs, it's the wrong behavior. > > > > Basically, programs are just hoping that overflow won't happen. And if > > it ever does happen, it's going to result in unexpected behavior and > > bugs. (Including the possibility of security bugs: not all security bugs > > are memory safety bugs.) This is a step up from C's insanity of > > undefined behavior for signed overflow, where the compiler assumes that > > overflow *cannot* happen and even optimizes based on that assumption, > > but it's still a sad state of affairs. If we're clearing the bar, that's > > only because it's been buried underground. > > > > We can divide programs into three categories. (I'm using "program" in > > the general sense of "piece of code which does a thing".) > > > > 1) Programs where wrapping around on overflow is the desired semantics. > > > > 2) Programs where wrapping around on overflow is not the desired > > semantics, but performance is not critical. > > If performance wasn't critical, the program wouldn't be written in Rust. > The language isn't aimed at use cases where performance isn't a bug > deal, as it makes many sacrifices to provide the level of control that's > available. > People write GUI frameworks and applications in C++ and even in C. Just because a language is appropriate for low-level and performance-critical code doesn't mean it needs to be inappropriate for anything else. I think Rust is far superior as a general-purpose language to most of today's mainstream languages. And even in applications where some parts are performance-critical, many parts may not be. I expect the ratios may be tilted differently for Rust code, but not the fundamental pattern to be different. > > > 3) Programs where wrapping around on overflow is not the desired > > semantics and performance is critical. > > > > Programs in (1) are well-served by the language and libraries as they > > are, and there's not much to do except to avoid regressing. > > > > Programs in (2) and (3) are not as well-served. > > > > > > # Checked math > > > > For (2), the standard library offers checked math in the `CheckedAdd`, > > `CheckedMul` etc. traits, as well as integer types of unbounded size: > > `BigInt` and `BigUint`. This is good, but it's not enough. The acid test > > has to be whether for non-performance-critical code, people are actually > > *using* checked math. If they're not, then we've failed. > > > > `CheckedAdd` and co. are important to have for flexibility, but they're > > far too unwieldy for general use. People aren't going to write > > `.checked_add(2).unwrap()` when they can write `+ 2`. A more adequate > > design might be something like this: > > > > * Have traits for all the arithmetic operations for both checking on > > overflow and for wrapping around on overflow, e.g. `CheckedAdd` (as > > now), `WrappingAdd`, `CheckedSub`, `WrappingSub`, and so on. > > > > * Offer convenience methods for the Checked traits which perform > > `unwrap()` automatically. > > > > * Have separate sets of integer types which check for overflow and > > which wrap around on overflow. Whatever they're called: `CheckedU8`, > > `checked::u8`, `uc8`, ... > > > > * Both sets of types implement all of the Checked* and Wrapping* > > traits. You can use explicit calls to get either behavior with either > types. > > > > * The checked types use the Checked traits to implement the operator > > overloads (`Add`, Mul`, etc.), while the wrapping types use the Wrapping > > traits to implement them. In other words, the difference between the > > types is (probably only) in the behavior of the operators. > > > > * `BigInt` also implements all of the Wrapping and Checked traits: > > because overflow never happens, it can claim to do anything if it "does > > happen". `BigUint` implements all of them except for the Wrapping traits > > which may underflow, such as `WrappingSub`, because it has nowhere to > > wrap around to. > > > > Another option would be to have just one set of types but two sets of > > operators, like Swift does. I think that would work as well, or even > > better, but we've been avoiding having any operators which aren't > > familiar from C. > > > > > > # Unbounded integers > > > > While checked math helps catch instances of overflow and prevent > > misbehaviors and bugs, many programs would prefer integer types which do > > the right thing and don't overflow in the first place. For this, again, > > we currently have `BigInt` and `BigUint`. There's one problem with > > these: because they may allocate, they no longer `Copy`, which means > > that they can't be just drop-in replacements for the fixed-size types. > > > > > > To partially address this, once we have tracing GC, and if we manage to > > make `Gc<T>: Copy`, we should add unbounded `Integer` (as in Haskell) > > and `Natural` types which internally use `Gc`, and so are also `Copy`. > > (In exchange, they wouldn't be `Send`, but that's far less pervasive.) > > These would (again, asides from `Send`) look and feel just like the > > built-in fixed-size types, while having the semantics of actual > > mathematical integers, resp. naturals (up to resource exhaustion of > > course). They would be ideal for code which is not performance critical > > and doesn't mind incurring, or already uses, garbage collection. For > > those cases, you wouldn't have to think about the tradeoffs, or make > > difficult choices: `Integer` is what you use. > > A tracing garbage collector for Rust is a possibility but not a > certainty. I don't think it would make sense to have `Gc<T>` support > `Copy` but have it left out for `Rc<T>`. The fact that an arbitrary > compiler decision like that would determine the most convenient type is > a great reason to avoid making that arbitrary choice. > > There's no opportunity for cycles in integers, and `Rc<T>` will be > faster along with using far less memory. It doesn't have the overhead > associated with reference counting in other languages due to being > task-local (not atomic) and much of the reference counting is elided by > move semantics / borrows. > > With the help of sized deallocation, Rust can have an incredibly fast > allocator implementation. Since `Rc<T>` is task-local, it also doesn't > need to be using the same allocator entry point as sendable types. It > can make use of a thread-local allocator with less complexity and > overhead, although this could also be available on an opt-in basis for > sendable types by changing the allocator parameter from the default. > > > One concern with this would be the possibility of programs incurring GC > > accidentally by using these types. There's several ways to deal with > this: > > > > * Note the fact that they use GC prominently in the documentation. > > > > * Make sure the No-GC lint catches any use of them. > > > > * Offer a "no GC in this task" mode which fails the task if GC > > allocation is invoked, to catch mistakes at runtime. > > > > I think these would be more than adequate to address the concern. > > I don't think encouraging tracing garbage collection is appropriate for > a language designed around avoiding it. It would be fine to have it as a > choice if it never gets in the way, but it shouldn't be promoted as a > default. > The idea is to pick the low-hanging fruit. For programs that use garbage collection, we can offer an integer type that requires neither ergonomic nor semantic compromises. So let's. > > > # Between a rock and a hard place > > > > Having dispatched the "easy" cases above, for category #3 we're left > > between the rock (wraparound on overflow is wrong) and the hard place > > (checking for overflow is slow). > > > > Even here, we may have options. > > > > An observation: > > > > * We are adamantly opposed to compiler switches to turn off array > > bounds checking, because we are unwilling to compromise memory safety. > > > > * We are relatively unbothered by unchecked arithmetic, because it > > *doesn't* compromise memory safety. > > > > Taking these together, I think we should at least be less than adamantly > > opposed to compiler switches for enabling or disabling checked > arithmetic. > > I think compiler switches or attributes enabling a different dialect of > the language are a bad idea as a whole. Code from libraries is directly > mixed into other crates, so changing the semantics of the language is > inherently broken. > Even if checked arithmetic is only turned on for testing/debugging and not in production code, it's still a strict improvement over the status quo. Under the status quo, except where wraparound is the intended semantics, overflow is silently wrong 100% of the time. With the alternative, that percentage is smaller. > > > Consider the status quo. People use integer types which wrap on > > overflow. If they ever do overflow, it means misbehaviors and bugs. If > > we had a compiler flag to turn on checked arithmetic, even if it were > > only used a few times in testing, it would be a strict improvement: more > > bugs would be caught with less effort. > > > > But we clearly can't just add this flag for existing types, because > > they're well-defined to wrap around on overflow, and some programs > > (category #1) rely on this. So we need to have separate types. > > > > One option is therefore to just define this set of types as failing the > > task on overflow if checked arithmetic is enabled, and to wrap around if > > it's disabled. But it doesn't necessarily make sense to specify > > wraparound in the latter case, given that programs are not supposed to > > depend on it: they may be compiled with either flag, and should avoid > > overflow. > > > > Another observation: > > > > * Undefined behavior is anathema to the safe subset of the language. > > That would mean that it's not safe. > > > > * But *unspecified results* are maybe not so bad. We might already have > > them for bit-shifts. (Question for the audience: do we?) > > > > If unspecified results are acceptable, then we could instead say that > > these types fail on overflow if checked arithmetic is enabled, and have > > unspecified results if it isn't. But saying they wrap around is fine as > > well. > > > > This way, we can put off the choice between the rock and the hard place > > from program-writing time to compile time, at least. > > > > > > # Defaults > > > > Even if we provide the various options from above, from the perspective > > of what types people end up using, defaults are very important. > > > > There's two kinds of defaults: > > > > * The de jure default, inferred by the type system in the absence of > > other information, which used to be `int`. Thankfully, we're removing > this. > > > > * The de facto, cultural default. For instance, if there is a type > > called "int", most people will use it without thinking. > > > > The latter question is still something we need to think about. Should we > > have a clear cultural default? Or should we force people to explicitly > > choose between checked and wrapping arithmetic? > > > > For the most part, this boils down to: > > > > * If `int` is checked, the default is slow > > > > * If `int` wraps, the default is wrong > > > > * If there is no `int`, people are confused > > > > Regarding the first, we seem to be living in deathly fear of someone > > naively writing an arithmetic benchmark in Rust, putting it up on the > > internet, and saying, "Look: Rust is slow". This is not an unrealistic > > scenario, sadly. The question is whether we'd rather have more programs > > be accidentally incorrect in order to avoid bad publicity from > > benchmarks being accidentally slow. > > > > Regarding the third, even if the only types we have are `intc`, `intw`, > > `ic8`, `iw8`, and so on, we *still* can't entirely escape creating a > > cultural default, because we still need to choose types for functions in > > the standard library and for built-in operations like array indexing. > > Then the path of least resistance for any given program will be to use > > the same types. > > > > There's one thing that might help resolve this conundrum, which is if we > > consider the previously-described scheme with compiler flags to control > > checked arithmetic to be acceptable. In that case, I think those types > > would be the clear choice to be the de facto defaults. Then we would > have: > > > > * `i8`, `u8` .. `i64`, `u64`, `int`, and `uint` types, which fail the > > task on overflow if checked arithmetic is turned on, and either wrap > > around or have an unspecified result if it's off > > > > * a corresponding set of types somewhere in the standard library, which > > wrap around no matter what > > > > * and another set of corresponding types, which are checked no matter > what > > > > > > -Gábor > > > _______________________________________________ > Rust-dev mailing list > [email protected] > https://mail.mozilla.org/listinfo/rust-dev > >
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
