Ok, here are my few cents:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu wrote:
Back when I've first introduced RCString I hinted that we have a larger strategy in mind. Here it is.

The basic tenet of the approach is to reckon and act on the fact that memory allocation (the subject of allocators) is an entirely distinct topic from memory management, and more generally resource management. This clarifies that it would be wrong to approach alternatives to GC in Phobos by means of allocators. GC is not only an approach to memory allocation, but also an approach to memory management. Reducing it to either one is a mistake. In hindsight this looks rather obvious but it has caused me and many people better than myself a lot of headache.

I would argue that GC is at its core _only_ a memory management strategy. It just so happens that the one in D's runtime also comes with an allocator, with which it is tightly integrated. In theory, a GC can work with any (and multiple) allocators, and you could of course also call GC.free() manually, because, as you say, management and allocation are entirely distinct topics.


That said allocators are nice to have and use, and I will definitely follow up with std.allocator. However, std.allocator is not the key to a @nogc Phobos.

Agreed.


Nor are ranges. There is an attitude that either output ranges, or input ranges in conjunction with lazy computation, would solve the issue of creating garbage. https://github.com/D-Programming-Language/phobos/pull/2423 is a good illustration of the latter approach: a range would be lazily created by chaining stuff together. A range-based approach would take us further than the allocators, but I see the following issues with it:

(a) the whole approach doesn't stand scrutiny for non-linear outputs, e.g. outputting some sort of associative array or really any composite type quickly becomes tenuous either with an output range (eager) or with exposing an input range (lazy);

(b) makes the style of programming without GC radically different, and much more cumbersome, than programming with GC; as a consequence, programmers who consider changing one approach to another, or implementing an algorithm neutral to it, are looking at a major rewrite;

(c) would make D/@nogc a poor cousin of C++. This is quite out of character; technically, I have long gotten used to seeing most elaborate C++ code like poor emulation of simple D idioms. But C++ has spent years and decades taking to perfection an approach without a tracing garbage collector. A departure from that would need to be superior, and that doesn't seem to be the case with range-based approaches.

I agree with this, too.


===========

Now that we clarified that these existing attempts are not going to work well, the question remains what does. For Phobos I'm thinking of defining and using three policies:

enum MemoryManagementPolicy { gc, rc, mrc }
immutable
    gc = ResourceManagementPolicy.gc,
    rc = ResourceManagementPolicy.rc,
    mrc = ResourceManagementPolicy.mrc;

The three policies are:

(a) gc is the classic garbage-collected style of management;

(b) rc is a reference-counted style still backed by the GC, i.e. the GC will still be able to pick up cycles and other kinds of leaks.

(c) mrc is a reference-counted style backed by malloc.

(It should be possible to collapse rc and mrc together and make the distinction dynamically, at runtime. I'm distinguishing them statically here for expository purposes.)

The policy is a template parameter to functions in Phobos (and elsewhere), and informs the functions e.g. what types to return. Consider:

auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 path, R2 ext)
if (...)
{
    static if (mmp == gc) alias S = string;
    else alias S = RCString;
    S result;
    ...
    return result;
}

On the caller side:

auto p1 = setExtension("hello", ".txt"); // fine, use gc
auto p2 = setExtension!gc("hello", ".txt"); // same
auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc

So by default it's going to continue being business as usual, but certain functions will allow passing in a (defaulted) policy for memory management.

This, however, I disagree with strongly. For one thing - this has already been noted by others - it would make the functions' implementation extremely ugly (`static if` hell), it would make them harder to unit test, and from a user's point of view, it's very tedious and might interfere badly with UFCS.

But more importantly, IMO, it's the wrong thing to do. These functions shouldn't know anything about memory management policy at all. They allocate, which means they need to know about _allocation_ policy, but memory _management_ policy needs to be decided by the user.

Now, your suggestion in a way still leaves that decision to the user, but does so in a very intrusive way, by passing a template flag. This is clearly a violation of the separation of concerns. Contrary to the typical case, implementation details of the user's code leak into the library code, and not the other way round, but that's just as bad.

I'm convinced this isn't necessary. Let's take `setExtension()` as an example, standing in for any of a class of similar functions. This function allocates memory, returns it, and abandons it; it gives up ownership of the memory. The fact that the memory has been freshly allocated means that it is (head) unique, and therefore the caller (= library user) can take over the ownership. This, in turn, means that the caller can decide how she wants to manage it.

(I'll try to make a sketch on how this can be implemented in another post.)

As a conclusion, I would say that APIs should strive for the following principles, in this order:

1. Avoid allocation altogether, for example by laziness (ranges), or by accepting sinks.

2. If allocations are necessary (or desirable, to make the API more easily usable), try hard to return a unique value (this of course needs to be expressed in the return type).

3. If both of the above fails, only then return a GCed pointer, or alternatively provide several variants of the function (though this shouldn't be necessary often). An interesting alternative: Instead of passing a flag directly describing the policy, pass the function a type that it should wrap it's return value in.

As for the _allocation_ strategy: It indeed needs to be configurable, but here, the same objections against a template parameter apply. As the allocator doesn't necessarily need to be part of the type, a (thread) global variable can be used to specify it. This lends itself well to idioms like

    with(MyAllocator alloc) {
        // ...
    }


Destroy!

Done :-)

Reply via email to