On 02/04/14 11:35 AM, Alex Crichton wrote:
> I've noticed recently that there seems to be a bit of confusion about the fate
> of ~[T] with an impending implementation of DST on the horizon. This has been
> accompanied with a number of pull requests to completely remove many uses of
> ~[T] throughout the standard distribution. I'd like to take some time to
> straighten out what's going on with Vec<T> and ~[T].

I think this is a difference of opinion, not "confusion". The original
pull requests switching `~[T]` to `Vec<T>` were done by pcwalton, and
this was with full knowledge of the plans for `~[T]`.

> # Vec<T>
> 
> In a post-DST world, Vec<T> will be the "vector builder" type. It will be the
> only type for building up a block of contiguous elements. This type exists
> today, and lives inside of std::vec. Today, you cannot index Vec<T>, but this
> will be enabled in the future once the indexing traits are fleshed out.

It will be Rust's vector (dynamic array) type. I don't think it makes
sense to call it a 'builder' any more than it makes sense to call
`HashMap<K, V>` a 'hash table builder'. It makes something simple far
more complicated than it needs to be.

> This type will otherwise largely not change from what it is today. It will
> continue to occupy three words in memory, and continue to have the same 
> runtime
> semantics.
> 
> # ~[T]
> 
> The type ~[T] will still exist in a post-DST, but its representation will
> change. Today, a value of type ~[T] is one word (I'll elide the details of 
> this
> for now). After DST is implemented, ~[T] will be a two-word value of the 
> length
> and a pointer to an array (similarly to what slices are today). The ~[T] type
> will continue to have move semantics, and you can borrow it to &[T] as usual.

The `~[T]` type will exist because `[T]` will exist as a type. It won't
be an explicit choice to support having it. Some of us consider it an
unfortunate consequence of DST rather than a useful type.

> The major difference between today's ~[T] type and a post-DST ~[T] is that the
> push() method will be removed. There is no knowledge of a capacity in the
> representation of a ~[T] value, so a push could not be supported at all. In
> theory a pop() can be efficiently supported, but it will likely not be
> implemented at first.

A `pop` or `shift` function is impossible to implement efficiently if
allocators require a size to be passed to `free`.

> # [T]
> 
> As part of DST, the type grammar will start accepting [T] as a possible
> substitute for type parameters. This basically means that if your type
> parameters is &T, then &[U] can satisfy the type parameter.
> 
> While possible, I imagine that it will be rare for this to appear in apis. 
> This
> is an unsized type, which means that it's more limited what you can do with it
> than you can with a sized type.
> 
> The full details of [T] will become apparent once DST is implemented, but it's
> safe to say that APIs and usage should rarely have to deal with this type, and
> it will likely be mostly transparent.
> 
> # Converting between Vec<T> and ~[T]
> 
> Conversions between these two types will be provided, and the default
> implementations will be free. Converting from Vec<T> to ~[T] will be simply
> forgetting the capacity, and converting from ~[T] to Vec<T> will set the
> capacity to the length.

Converting from `Vec<T>` to `~[T]` will not be free with an efficient
allocation scheme. I don't think Rust will want to be using a legacy
`malloc`/`free` style API as the underlying default allocator in the
future. I see it only as a temporary measure before a modern allocation
model is implemented.

Without a size parameter to `free`, an allocator needs to track the size
of allocations manually. It increases the memory overhead, along with
adding bookkeeping overhead.

C++ allocators take a `size` parameter to the `deallocate` function for
this reason and I expect Rust will want to do the same. The design of
`malloc` and `free` is far from ideal, because the length is either
known statically or dynamically in nearly every case.

I think leaving out the capacity field of vectors in some cases without
dropping the excess capacity is an an insignificant micro-optimization.
In contract, passing the length to `free` is quite valuable and will
result in a measurable performance win across nearly all Rust code with
an allocator taking advantage of it.

> Helper methods will likely be provided to perform a forceful reallocating
> shrink when going from Vec<T> to ~[T], but it will not be the default.

It has to be the *only* way to do it if Rust is going to be able to
switch to an efficient allocation model in the future. The API of
`malloc`, `realloc` and `free` is purely a legacy wart and shouldn't
drive the design of a new language/library.

> ## The cost of Vec<T> => ~[T]
> 
> Some concerns have been brought up that this can in theory be a costly
> transition under the assumption that this does a reallocation of memory to
> shrink to the capacity to exactly the length. This will likely not be the
> default implementation.

I think it likely will be the *only* implementation.

> Some concerns have then been brought up that some allocators require the size
> of the allocation to be passed to free(), and that this model is incompatible
> with that flavor of allocator. We believe that this fear can be
> alleviated with a "shrink if necessary" method on allocators. The default
> allocator (backed by the system malloc) would be a no-op because the size to
> free is not used. Allocators which use the size passed to free would actually
> perform a reallocation.

The default allocator will be taking a length to `free` if it's an
efficient implementation. There's no reason for Rust to remain tied to
the obsolete `malloc`, `realloc` and `free` API forever.

> # Choosing between Vec<T> and ~[T]
> 
> Primarily, if you need a growable vector, you should use Vec<T>. If you do not
> need a growable vector, but you're instead just dealing with an array of 
> items,
> then you should use ~[T].
> 
> As a concrete example, I'll take the read_to_end() method on io's Reader 
> trait.
> This type must use a Vec<T> internally to read data into the vector, but it 
> will
> return a ~[T] because the contents are conceptually frozen after they have 
> been
> read.
> 
> There is no blanket right decision to choose between Vec<T> and ~[T], this 
> will
> need to be done on a case-by-case basis to evaluate whether apis should take 
> or
> consume Vec<T> or ~[T].
> 
> # Moving Forward
> 
> In order to implement DST, it is not necessary to remove all usage of ~[T]
> today. It is necessary to remove all *growable* usage of ~[T], however. All 
> uses
> of vectors which need growable or shrinkable vectors need to switch to Vec<T>.
> If a vector does not need to be grown or shrunk, it can remain as ~[T].

If `~[T]` remains used throughout the libraries, Rust will become
noisier than languages like C++ with a unified vector type. The need to
convert between `Vec<T>` and `~[T]` would add noise to lots of code,
without any adding measurable optimization win. A micro-optimization
shouldn't drive the design of the libraries, especially when it will
prevent making a significant *macro*-optimization (passing a length to
the deallocation function).

> Concretely speaking, the next steps forward for ~[T] would entail:
> 
> * Add a Vec<T> -> ~[T] conversion. This will be an expensive conversion today
>   because it requires an allocation (due to the layout of today's ~[T]), but 
> it
>   will not be expensive in the future.
> * Add a ~[T] -> Vec conversion. Like the above step, this will also be
>   expensive, but it will not be so in the future.

It may be expensive in some cases in the future, and I don't think cheap
in *some* cases can ever be regarded as free or cheap when it's not a
predictable cost.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to