On 18/06/14 01:08 PM, Gábor Lehel wrote: > # Exposition > > We've debated the subject of integer overflow quite a bit, without much > apparent progress. Essentially, we've been running in circles around two > core facts: wrapping is bad, but checking is slow. The current consensus > seems to be to, albeit grudgingly, stick with the status quo. > > I think we've established that a perfect, one-size-fits-all solution is > not possible. But I don't think this means that we're out of options, or > have no room for improvement. I think there are several imperfect, > partial solutions we could pursue, to address the various use cases in a > divide-and-conquer fashion. > > This is not a formal RFC, more of a grab bag of thoughts and ideas. > > The central consideration has to be the observation that, while wrapping > around on overflow is well-supported by hardware, for the large majority > of programs, it's the wrong behavior. > > Basically, programs are just hoping that overflow won't happen. And if > it ever does happen, it's going to result in unexpected behavior and > bugs. (Including the possibility of security bugs: not all security bugs > are memory safety bugs.) This is a step up from C's insanity of > undefined behavior for signed overflow, where the compiler assumes that > overflow *cannot* happen and even optimizes based on that assumption, > but it's still a sad state of affairs. If we're clearing the bar, that's > only because it's been buried underground. > > We can divide programs into three categories. (I'm using "program" in > the general sense of "piece of code which does a thing".) > > 1) Programs where wrapping around on overflow is the desired semantics. > > 2) Programs where wrapping around on overflow is not the desired > semantics, but performance is not critical.
If performance wasn't critical, the program wouldn't be written in Rust. The language isn't aimed at use cases where performance isn't a bug deal, as it makes many sacrifices to provide the level of control that's available. > 3) Programs where wrapping around on overflow is not the desired > semantics and performance is critical. > > Programs in (1) are well-served by the language and libraries as they > are, and there's not much to do except to avoid regressing. > > Programs in (2) and (3) are not as well-served. > > > # Checked math > > For (2), the standard library offers checked math in the `CheckedAdd`, > `CheckedMul` etc. traits, as well as integer types of unbounded size: > `BigInt` and `BigUint`. This is good, but it's not enough. The acid test > has to be whether for non-performance-critical code, people are actually > *using* checked math. If they're not, then we've failed. > > `CheckedAdd` and co. are important to have for flexibility, but they're > far too unwieldy for general use. People aren't going to write > `.checked_add(2).unwrap()` when they can write `+ 2`. A more adequate > design might be something like this: > > * Have traits for all the arithmetic operations for both checking on > overflow and for wrapping around on overflow, e.g. `CheckedAdd` (as > now), `WrappingAdd`, `CheckedSub`, `WrappingSub`, and so on. > > * Offer convenience methods for the Checked traits which perform > `unwrap()` automatically. > > * Have separate sets of integer types which check for overflow and > which wrap around on overflow. Whatever they're called: `CheckedU8`, > `checked::u8`, `uc8`, ... > > * Both sets of types implement all of the Checked* and Wrapping* > traits. You can use explicit calls to get either behavior with either types. > > * The checked types use the Checked traits to implement the operator > overloads (`Add`, Mul`, etc.), while the wrapping types use the Wrapping > traits to implement them. In other words, the difference between the > types is (probably only) in the behavior of the operators. > > * `BigInt` also implements all of the Wrapping and Checked traits: > because overflow never happens, it can claim to do anything if it "does > happen". `BigUint` implements all of them except for the Wrapping traits > which may underflow, such as `WrappingSub`, because it has nowhere to > wrap around to. > > Another option would be to have just one set of types but two sets of > operators, like Swift does. I think that would work as well, or even > better, but we've been avoiding having any operators which aren't > familiar from C. > > > # Unbounded integers > > While checked math helps catch instances of overflow and prevent > misbehaviors and bugs, many programs would prefer integer types which do > the right thing and don't overflow in the first place. For this, again, > we currently have `BigInt` and `BigUint`. There's one problem with > these: because they may allocate, they no longer `Copy`, which means > that they can't be just drop-in replacements for the fixed-size types. > > > To partially address this, once we have tracing GC, and if we manage to > make `Gc<T>: Copy`, we should add unbounded `Integer` (as in Haskell) > and `Natural` types which internally use `Gc`, and so are also `Copy`. > (In exchange, they wouldn't be `Send`, but that's far less pervasive.) > These would (again, asides from `Send`) look and feel just like the > built-in fixed-size types, while having the semantics of actual > mathematical integers, resp. naturals (up to resource exhaustion of > course). They would be ideal for code which is not performance critical > and doesn't mind incurring, or already uses, garbage collection. For > those cases, you wouldn't have to think about the tradeoffs, or make > difficult choices: `Integer` is what you use. A tracing garbage collector for Rust is a possibility but not a certainty. I don't think it would make sense to have `Gc<T>` support `Copy` but have it left out for `Rc<T>`. The fact that an arbitrary compiler decision like that would determine the most convenient type is a great reason to avoid making that arbitrary choice. There's no opportunity for cycles in integers, and `Rc<T>` will be faster along with using far less memory. It doesn't have the overhead associated with reference counting in other languages due to being task-local (not atomic) and much of the reference counting is elided by move semantics / borrows. With the help of sized deallocation, Rust can have an incredibly fast allocator implementation. Since `Rc<T>` is task-local, it also doesn't need to be using the same allocator entry point as sendable types. It can make use of a thread-local allocator with less complexity and overhead, although this could also be available on an opt-in basis for sendable types by changing the allocator parameter from the default. > One concern with this would be the possibility of programs incurring GC > accidentally by using these types. There's several ways to deal with this: > > * Note the fact that they use GC prominently in the documentation. > > * Make sure the No-GC lint catches any use of them. > > * Offer a "no GC in this task" mode which fails the task if GC > allocation is invoked, to catch mistakes at runtime. > > I think these would be more than adequate to address the concern. I don't think encouraging tracing garbage collection is appropriate for a language designed around avoiding it. It would be fine to have it as a choice if it never gets in the way, but it shouldn't be promoted as a default. > # Between a rock and a hard place > > Having dispatched the "easy" cases above, for category #3 we're left > between the rock (wraparound on overflow is wrong) and the hard place > (checking for overflow is slow). > > Even here, we may have options. > > An observation: > > * We are adamantly opposed to compiler switches to turn off array > bounds checking, because we are unwilling to compromise memory safety. > > * We are relatively unbothered by unchecked arithmetic, because it > *doesn't* compromise memory safety. > > Taking these together, I think we should at least be less than adamantly > opposed to compiler switches for enabling or disabling checked arithmetic. I think compiler switches or attributes enabling a different dialect of the language are a bad idea as a whole. Code from libraries is directly mixed into other crates, so changing the semantics of the language is inherently broken. > Consider the status quo. People use integer types which wrap on > overflow. If they ever do overflow, it means misbehaviors and bugs. If > we had a compiler flag to turn on checked arithmetic, even if it were > only used a few times in testing, it would be a strict improvement: more > bugs would be caught with less effort. > > But we clearly can't just add this flag for existing types, because > they're well-defined to wrap around on overflow, and some programs > (category #1) rely on this. So we need to have separate types. > > One option is therefore to just define this set of types as failing the > task on overflow if checked arithmetic is enabled, and to wrap around if > it's disabled. But it doesn't necessarily make sense to specify > wraparound in the latter case, given that programs are not supposed to > depend on it: they may be compiled with either flag, and should avoid > overflow. > > Another observation: > > * Undefined behavior is anathema to the safe subset of the language. > That would mean that it's not safe. > > * But *unspecified results* are maybe not so bad. We might already have > them for bit-shifts. (Question for the audience: do we?) > > If unspecified results are acceptable, then we could instead say that > these types fail on overflow if checked arithmetic is enabled, and have > unspecified results if it isn't. But saying they wrap around is fine as > well. > > This way, we can put off the choice between the rock and the hard place > from program-writing time to compile time, at least. > > > # Defaults > > Even if we provide the various options from above, from the perspective > of what types people end up using, defaults are very important. > > There's two kinds of defaults: > > * The de jure default, inferred by the type system in the absence of > other information, which used to be `int`. Thankfully, we're removing this. > > * The de facto, cultural default. For instance, if there is a type > called "int", most people will use it without thinking. > > The latter question is still something we need to think about. Should we > have a clear cultural default? Or should we force people to explicitly > choose between checked and wrapping arithmetic? > > For the most part, this boils down to: > > * If `int` is checked, the default is slow > > * If `int` wraps, the default is wrong > > * If there is no `int`, people are confused > > Regarding the first, we seem to be living in deathly fear of someone > naively writing an arithmetic benchmark in Rust, putting it up on the > internet, and saying, "Look: Rust is slow". This is not an unrealistic > scenario, sadly. The question is whether we'd rather have more programs > be accidentally incorrect in order to avoid bad publicity from > benchmarks being accidentally slow. > > Regarding the third, even if the only types we have are `intc`, `intw`, > `ic8`, `iw8`, and so on, we *still* can't entirely escape creating a > cultural default, because we still need to choose types for functions in > the standard library and for built-in operations like array indexing. > Then the path of least resistance for any given program will be to use > the same types. > > There's one thing that might help resolve this conundrum, which is if we > consider the previously-described scheme with compiler flags to control > checked arithmetic to be acceptable. In that case, I think those types > would be the clear choice to be the de facto defaults. Then we would have: > > * `i8`, `u8` .. `i64`, `u64`, `int`, and `uint` types, which fail the > task on overflow if checked arithmetic is turned on, and either wrap > around or have an unspecified result if it's off > > * a corresponding set of types somewhere in the standard library, which > wrap around no matter what > > * and another set of corresponding types, which are checked no matter what > > > -Gábor
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
