On 02/04/14 11:35 AM, Alex Crichton wrote: > I've noticed recently that there seems to be a bit of confusion about the fate > of ~[T] with an impending implementation of DST on the horizon. This has been > accompanied with a number of pull requests to completely remove many uses of > ~[T] throughout the standard distribution. I'd like to take some time to > straighten out what's going on with Vec<T> and ~[T].
I think this is a difference of opinion, not "confusion". The original pull requests switching `~[T]` to `Vec<T>` were done by pcwalton, and this was with full knowledge of the plans for `~[T]`. > # Vec<T> > > In a post-DST world, Vec<T> will be the "vector builder" type. It will be the > only type for building up a block of contiguous elements. This type exists > today, and lives inside of std::vec. Today, you cannot index Vec<T>, but this > will be enabled in the future once the indexing traits are fleshed out. It will be Rust's vector (dynamic array) type. I don't think it makes sense to call it a 'builder' any more than it makes sense to call `HashMap<K, V>` a 'hash table builder'. It makes something simple far more complicated than it needs to be. > This type will otherwise largely not change from what it is today. It will > continue to occupy three words in memory, and continue to have the same > runtime > semantics. > > # ~[T] > > The type ~[T] will still exist in a post-DST, but its representation will > change. Today, a value of type ~[T] is one word (I'll elide the details of > this > for now). After DST is implemented, ~[T] will be a two-word value of the > length > and a pointer to an array (similarly to what slices are today). The ~[T] type > will continue to have move semantics, and you can borrow it to &[T] as usual. The `~[T]` type will exist because `[T]` will exist as a type. It won't be an explicit choice to support having it. Some of us consider it an unfortunate consequence of DST rather than a useful type. > The major difference between today's ~[T] type and a post-DST ~[T] is that the > push() method will be removed. There is no knowledge of a capacity in the > representation of a ~[T] value, so a push could not be supported at all. In > theory a pop() can be efficiently supported, but it will likely not be > implemented at first. A `pop` or `shift` function is impossible to implement efficiently if allocators require a size to be passed to `free`. > # [T] > > As part of DST, the type grammar will start accepting [T] as a possible > substitute for type parameters. This basically means that if your type > parameters is &T, then &[U] can satisfy the type parameter. > > While possible, I imagine that it will be rare for this to appear in apis. > This > is an unsized type, which means that it's more limited what you can do with it > than you can with a sized type. > > The full details of [T] will become apparent once DST is implemented, but it's > safe to say that APIs and usage should rarely have to deal with this type, and > it will likely be mostly transparent. > > # Converting between Vec<T> and ~[T] > > Conversions between these two types will be provided, and the default > implementations will be free. Converting from Vec<T> to ~[T] will be simply > forgetting the capacity, and converting from ~[T] to Vec<T> will set the > capacity to the length. Converting from `Vec<T>` to `~[T]` will not be free with an efficient allocation scheme. I don't think Rust will want to be using a legacy `malloc`/`free` style API as the underlying default allocator in the future. I see it only as a temporary measure before a modern allocation model is implemented. Without a size parameter to `free`, an allocator needs to track the size of allocations manually. It increases the memory overhead, along with adding bookkeeping overhead. C++ allocators take a `size` parameter to the `deallocate` function for this reason and I expect Rust will want to do the same. The design of `malloc` and `free` is far from ideal, because the length is either known statically or dynamically in nearly every case. I think leaving out the capacity field of vectors in some cases without dropping the excess capacity is an an insignificant micro-optimization. In contract, passing the length to `free` is quite valuable and will result in a measurable performance win across nearly all Rust code with an allocator taking advantage of it. > Helper methods will likely be provided to perform a forceful reallocating > shrink when going from Vec<T> to ~[T], but it will not be the default. It has to be the *only* way to do it if Rust is going to be able to switch to an efficient allocation model in the future. The API of `malloc`, `realloc` and `free` is purely a legacy wart and shouldn't drive the design of a new language/library. > ## The cost of Vec<T> => ~[T] > > Some concerns have been brought up that this can in theory be a costly > transition under the assumption that this does a reallocation of memory to > shrink to the capacity to exactly the length. This will likely not be the > default implementation. I think it likely will be the *only* implementation. > Some concerns have then been brought up that some allocators require the size > of the allocation to be passed to free(), and that this model is incompatible > with that flavor of allocator. We believe that this fear can be > alleviated with a "shrink if necessary" method on allocators. The default > allocator (backed by the system malloc) would be a no-op because the size to > free is not used. Allocators which use the size passed to free would actually > perform a reallocation. The default allocator will be taking a length to `free` if it's an efficient implementation. There's no reason for Rust to remain tied to the obsolete `malloc`, `realloc` and `free` API forever. > # Choosing between Vec<T> and ~[T] > > Primarily, if you need a growable vector, you should use Vec<T>. If you do not > need a growable vector, but you're instead just dealing with an array of > items, > then you should use ~[T]. > > As a concrete example, I'll take the read_to_end() method on io's Reader > trait. > This type must use a Vec<T> internally to read data into the vector, but it > will > return a ~[T] because the contents are conceptually frozen after they have > been > read. > > There is no blanket right decision to choose between Vec<T> and ~[T], this > will > need to be done on a case-by-case basis to evaluate whether apis should take > or > consume Vec<T> or ~[T]. > > # Moving Forward > > In order to implement DST, it is not necessary to remove all usage of ~[T] > today. It is necessary to remove all *growable* usage of ~[T], however. All > uses > of vectors which need growable or shrinkable vectors need to switch to Vec<T>. > If a vector does not need to be grown or shrunk, it can remain as ~[T]. If `~[T]` remains used throughout the libraries, Rust will become noisier than languages like C++ with a unified vector type. The need to convert between `Vec<T>` and `~[T]` would add noise to lots of code, without any adding measurable optimization win. A micro-optimization shouldn't drive the design of the libraries, especially when it will prevent making a significant *macro*-optimization (passing a length to the deallocation function). > Concretely speaking, the next steps forward for ~[T] would entail: > > * Add a Vec<T> -> ~[T] conversion. This will be an expensive conversion today > because it requires an allocation (due to the layout of today's ~[T]), but > it > will not be expensive in the future. > * Add a ~[T] -> Vec conversion. Like the above step, this will also be > expensive, but it will not be so in the future. It may be expensive in some cases in the future, and I don't think cheap in *some* cases can ever be regarded as free or cheap when it's not a predictable cost.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
