Prompted by the recent discussion about the syntax to use for placement
new, I took a few steps back to think about the bigger picture. I'm glad I
did, because I now think that the important part of the feature has been
misidentified so far, and correctly identifying it opens the door to a much
more general and powerful mechanism, with a smaller conceptual "surface
area", and whose implementation burden doesn't appear any larger than that
of the placement new feature as up-to-now envisioned (in particular, the
magic sauce is the same in both).


## What is placement new?

The C++ syntax for "placement new" is: `new (place) MyType(args...)`, where
place is a memory address. This constructs a new instance of `MyType` into
`place`. This is useful because it lets you avoid the overhead of first
constructing the `MyType` elsewhere and then copying or moving it into
`place`.

I used to think that placement new is not commonly used in C++, but I have
revised my opinion. As Daniel Micay reminded me of in the other thread, C++
now has things like `emplace_back` in many places which use placement new
under the hood to avoid the cost of a move. The reason C++ didn't
*previously* use placement new very often is that the lack of perfect
forwarding and variadic templates made doing so in a generic way way too
painful.

This is also one of the things that got me thinking: restricting our
version of placement new to smart pointers seems artificial and not very
well-motivated, except if by the fact that smart pointers are an important
use case and the built-in ones already have the feature. But we also want
`emplace_back`, don't we?


## What is placement new, *really*?

The C++ signature of `emplace_back` is:

    template<typename T> // I'm ignoring the allocator
    template<typename... Args>
    void vector<T>::emplace_back(Args&&... args);

What it does is reserve some memory at the end of the vector, then use
placement new to invoke the constructor of `T`, with `args`, into that
memory. How could we translate this to Rust? First of all, Rust doesn't
have constructors as a distinguished language feature. But a constructor is
not anything more than a function statically known based on a type: in
Rust, a (self-less) trait method. Rust doesn't have variadic templates
either, but we could use tuples. (As for perfect forwarding, I suspect it
solves a problem Rust doesn't have.) So then we have:

    trait Construct<Args> {
        fn construct(Args) -> Self;
    }

    fn emplace_back<Args, T: Construct<Args>>(self: &mut Vec<T>, args:
Args);

So in reality, placement new is nothing more than a statically dispatched
trait method call. The only reason C++ needs special syntax for it is
because C++ distinguishes constructors from other functions. In particular,
it's worth noticing that if C++'s functions use a hidden out-pointer
argument like Rust's functions do, then the C++ version of `emplace_back`
could just as well take a function pointer returning `T` as a non-type
template parameter, and call that instead of T's constructor. (Except C++
doesn't let you make a function pointer to a constructor either, so you
would still want both versions. C++'s complications are of its own design.)

(It's also worth noticing that C++ is using the same
pass-the-environment-explicitly encoding of closures here that Rust
occasionally does.)


## Magic sauce

All of that begs a question: if placement new is not much more than calling
a trait method, then why are we in the process of adding placement new? If
what we're adding isn't placement new, then what *is* it? Well: we *could*
use something like the above Rust version of `emplace_back`, but it would
be super-tedious. Right now if you write `~(this thing)` or `@(that
thing)`, this thing and that thing are constructed (placement newed)
directly into the allocated memory -- where this thing and that thing can
be any expression. That's really nice. We want other smart pointers to be
able to do that.

How does it work? If this thing or that thing is a function call, then the
out-pointer is set to the allocated memory. If it's an expression, then the
compiler emits code to put the result of the expression directly into the
memory. Without having direct knowledge, I imagine that's the same way it's
planned to work for other smart pointers. (I can't imagine any other way to
do it.) What this implies is that the function that allocates the memory is
monomorphized (has a separate version generated) not only on the type of
the object being constructed, but also on the specific expression
constructing it. This is no different from what happens with the current
built-in smart pointers, where the compiler has built-in special-cased code
generation for them. It's also no different from C++'s `emplace_back`,
except there the possible "expressions" are limited to the constructors
available for the type. (If you consider the hypothetical version passing a
function pointer as a template argument, then the number of possibilities
is again much larger.)

So what we're adding is this ability to monomorphize user-provided
functions (smart pointer constructors) on the expression provided as their
argument.


## Proposal

Given that we're adding this magic one way or another, I propose to make it
more generally available. Rather than restrict it to smart pointer
constructors, tied in to the compiler with some kind of built-in trait,
let's let *any* function use it: let them declare that any particular
argument is to be passed "by-expression" in this way, by monomorphizing the
function over it. To avoid duplicating side effects and such, in the body
of the function, this argument may only be used once (which already holds
in the case of things like smart pointer constructors and `emplace_back`).
(So it's kind of like call-by-name, except it happens statically and can
only be called once.)

With regards to syntax, I don't have any ideas yet that I'm particularly
satisfied with. Here are some ideas:

    fn new_gc<T>(x: expr T) -> Gc<T>;

    fn new_gc<T, Exp: expr T>(x: Exp) -> Gc<T>;

    fn new_gc<T, static x: expr T>(x) -> Gc<T>;

It's difficult because the expression parameter wants to be passed both
between `<>`, because it's passed at compile time and leads to
monomorphization, and also between `()`, because that's where the
programmer actually writes the argument when calling it. I think I like the
second possibility best so far: in that case you can interpret it as
working with inference like type parameters do. (But I don't even know
whether or not there might be other good possibilities besides an `expr`
keyword, as in all three of these.)


Anyways, upshot:

 * Instead of being a restricted subset of C++, our "placement new" ability
becomes much broader and more convenient than C++

 * We no longer need special "placement new" syntax, and can keep writing
`Gc::new(some thing)`, because the magic is moved to the callee

 * In exchange, we do need special syntax in the callee, so bikeshed
painters will not have their employment prospects diminished.


Thanks for reading:
Gábor

-- 
Your ship was destroyed in a monadic eruption.
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to