Prompted by the recent discussion about the syntax to use for placement new, I took a few steps back to think about the bigger picture. I'm glad I did, because I now think that the important part of the feature has been misidentified so far, and correctly identifying it opens the door to a much more general and powerful mechanism, with a smaller conceptual "surface area", and whose implementation burden doesn't appear any larger than that of the placement new feature as up-to-now envisioned (in particular, the magic sauce is the same in both).
## What is placement new? The C++ syntax for "placement new" is: `new (place) MyType(args...)`, where place is a memory address. This constructs a new instance of `MyType` into `place`. This is useful because it lets you avoid the overhead of first constructing the `MyType` elsewhere and then copying or moving it into `place`. I used to think that placement new is not commonly used in C++, but I have revised my opinion. As Daniel Micay reminded me of in the other thread, C++ now has things like `emplace_back` in many places which use placement new under the hood to avoid the cost of a move. The reason C++ didn't *previously* use placement new very often is that the lack of perfect forwarding and variadic templates made doing so in a generic way way too painful. This is also one of the things that got me thinking: restricting our version of placement new to smart pointers seems artificial and not very well-motivated, except if by the fact that smart pointers are an important use case and the built-in ones already have the feature. But we also want `emplace_back`, don't we? ## What is placement new, *really*? The C++ signature of `emplace_back` is: template<typename T> // I'm ignoring the allocator template<typename... Args> void vector<T>::emplace_back(Args&&... args); What it does is reserve some memory at the end of the vector, then use placement new to invoke the constructor of `T`, with `args`, into that memory. How could we translate this to Rust? First of all, Rust doesn't have constructors as a distinguished language feature. But a constructor is not anything more than a function statically known based on a type: in Rust, a (self-less) trait method. Rust doesn't have variadic templates either, but we could use tuples. (As for perfect forwarding, I suspect it solves a problem Rust doesn't have.) So then we have: trait Construct<Args> { fn construct(Args) -> Self; } fn emplace_back<Args, T: Construct<Args>>(self: &mut Vec<T>, args: Args); So in reality, placement new is nothing more than a statically dispatched trait method call. The only reason C++ needs special syntax for it is because C++ distinguishes constructors from other functions. In particular, it's worth noticing that if C++'s functions use a hidden out-pointer argument like Rust's functions do, then the C++ version of `emplace_back` could just as well take a function pointer returning `T` as a non-type template parameter, and call that instead of T's constructor. (Except C++ doesn't let you make a function pointer to a constructor either, so you would still want both versions. C++'s complications are of its own design.) (It's also worth noticing that C++ is using the same pass-the-environment-explicitly encoding of closures here that Rust occasionally does.) ## Magic sauce All of that begs a question: if placement new is not much more than calling a trait method, then why are we in the process of adding placement new? If what we're adding isn't placement new, then what *is* it? Well: we *could* use something like the above Rust version of `emplace_back`, but it would be super-tedious. Right now if you write `~(this thing)` or `@(that thing)`, this thing and that thing are constructed (placement newed) directly into the allocated memory -- where this thing and that thing can be any expression. That's really nice. We want other smart pointers to be able to do that. How does it work? If this thing or that thing is a function call, then the out-pointer is set to the allocated memory. If it's an expression, then the compiler emits code to put the result of the expression directly into the memory. Without having direct knowledge, I imagine that's the same way it's planned to work for other smart pointers. (I can't imagine any other way to do it.) What this implies is that the function that allocates the memory is monomorphized (has a separate version generated) not only on the type of the object being constructed, but also on the specific expression constructing it. This is no different from what happens with the current built-in smart pointers, where the compiler has built-in special-cased code generation for them. It's also no different from C++'s `emplace_back`, except there the possible "expressions" are limited to the constructors available for the type. (If you consider the hypothetical version passing a function pointer as a template argument, then the number of possibilities is again much larger.) So what we're adding is this ability to monomorphize user-provided functions (smart pointer constructors) on the expression provided as their argument. ## Proposal Given that we're adding this magic one way or another, I propose to make it more generally available. Rather than restrict it to smart pointer constructors, tied in to the compiler with some kind of built-in trait, let's let *any* function use it: let them declare that any particular argument is to be passed "by-expression" in this way, by monomorphizing the function over it. To avoid duplicating side effects and such, in the body of the function, this argument may only be used once (which already holds in the case of things like smart pointer constructors and `emplace_back`). (So it's kind of like call-by-name, except it happens statically and can only be called once.) With regards to syntax, I don't have any ideas yet that I'm particularly satisfied with. Here are some ideas: fn new_gc<T>(x: expr T) -> Gc<T>; fn new_gc<T, Exp: expr T>(x: Exp) -> Gc<T>; fn new_gc<T, static x: expr T>(x) -> Gc<T>; It's difficult because the expression parameter wants to be passed both between `<>`, because it's passed at compile time and leads to monomorphization, and also between `()`, because that's where the programmer actually writes the argument when calling it. I think I like the second possibility best so far: in that case you can interpret it as working with inference like type parameters do. (But I don't even know whether or not there might be other good possibilities besides an `expr` keyword, as in all three of these.) Anyways, upshot: * Instead of being a restricted subset of C++, our "placement new" ability becomes much broader and more convenient than C++ * We no longer need special "placement new" syntax, and can keep writing `Gc::new(some thing)`, because the magic is moved to the callee * In exchange, we do need special syntax in the callee, so bikeshed painters will not have their employment prospects diminished. Thanks for reading: Gábor -- Your ship was destroyed in a monadic eruption.
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev