Re: RFC: moving forward with @nogc Phobos

via Digitalmars-d Tue, 30 Sep 2014 12:16:23 -0700

Ok, here are my few cents:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescuwrote:

Back when I've first introduced RCString I hinted that we havea larger strategy in mind. Here it is.
The basic tenet of the approach is to reckon and act on thefact that memory allocation (the subject of allocators) is anentirely distinct topic from memory management, and moregenerally resource management. This clarifies that it would bewrong to approach alternatives to GC in Phobos by means ofallocators. GC is not only an approach to memory allocation,but also an approach to memory management. Reducing it toeither one is a mistake. In hindsight this looks rather obviousbut it has caused me and many people better than myself a lotof headache.

I would argue that GC is at its core _only_ a memory managementstrategy. It just so happens that the one in D's runtime alsocomes with an allocator, with which it is tightly integrated. Intheory, a GC can work with any (and multiple) allocators, and youcould of course also call GC.free() manually, because, as yousay, management and allocation are entirely distinct topics.

That said allocators are nice to have and use, and I willdefinitely follow up with std.allocator. However, std.allocatoris not the key to a @nogc Phobos.


Agreed.

Nor are ranges. There is an attitude that either output ranges,or input ranges in conjunction with lazy computation, wouldsolve the issue of creating garbage.https://github.com/D-Programming-Language/phobos/pull/2423 is agood illustration of the latter approach: a range would belazily created by chaining stuff together. A range-basedapproach would take us further than the allocators, but I seethe following issues with it:
(a) the whole approach doesn't stand scrutiny for non-linearoutputs, e.g. outputting some sort of associative array orreally any composite type quickly becomes tenuous either withan output range (eager) or with exposing an input range (lazy);
(b) makes the style of programming without GC radicallydifferent, and much more cumbersome, than programming with GC;as a consequence, programmers who consider changing oneapproach to another, or implementing an algorithm neutral toit, are looking at a major rewrite;
(c) would make D/@nogc a poor cousin of C++. This is quite outof character; technically, I have long gotten used to seeingmost elaborate C++ code like poor emulation of simple D idioms.But C++ has spent years and decades taking to perfection anapproach without a tracing garbage collector. A departure fromthat would need to be superior, and that doesn't seem to be thecase with range-based approaches.


I agree with this, too.

===========
Now that we clarified that these existing attempts are notgoing to work well, the question remains what does. For PhobosI'm thinking of defining and using three policies:
enum MemoryManagementPolicy { gc, rc, mrc }
immutable
    gc = ResourceManagementPolicy.gc,
    rc = ResourceManagementPolicy.rc,
    mrc = ResourceManagementPolicy.mrc;

The three policies are:

(a) gc is the classic garbage-collected style of management;
(b) rc is a reference-counted style still backed by the GC,i.e. the GC will still be able to pick up cycles and otherkinds of leaks.
(c) mrc is a reference-counted style backed by malloc.
(It should be possible to collapse rc and mrc together and makethe distinction dynamically, at runtime. I'm distinguishingthem statically here for expository purposes.)
The policy is a template parameter to functions in Phobos (andelsewhere), and informs the functions e.g. what types toreturn. Consider:
auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1path, R2 ext)
if (...)
{
    static if (mmp == gc) alias S = string;
    else alias S = RCString;
    S result;
    ...
    return result;
}

On the caller side:

auto p1 = setExtension("hello", ".txt"); // fine, use gc
auto p2 = setExtension!gc("hello", ".txt"); // same
auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc
So by default it's going to continue being business as usual,but certain functions will allow passing in a (defaulted)policy for memory management.

This, however, I disagree with strongly. For one thing - this hasalready been noted by others - it would make the functions'implementation extremely ugly (`static if` hell), it would makethem harder to unit test, and from a user's point of view, it'svery tedious and might interfere badly with UFCS.

But more importantly, IMO, it's the wrong thing to do. Thesefunctions shouldn't know anything about memory management policyat all. They allocate, which means they need to know about_allocation_ policy, but memory _management_ policy needs to bedecided by the user.

Now, your suggestion in a way still leaves that decision to theuser, but does so in a very intrusive way, by passing a templateflag. This is clearly a violation of the separation of concerns.Contrary to the typical case, implementation details of theuser's code leak into the library code, and not the other wayround, but that's just as bad.

I'm convinced this isn't necessary. Let's take `setExtension()`as an example, standing in for any of a class of similarfunctions. This function allocates memory, returns it, andabandons it; it gives up ownership of the memory. The fact thatthe memory has been freshly allocated means that it is (head)unique, and therefore the caller (= library user) can take overthe ownership. This, in turn, means that the caller can decidehow she wants to manage it.

(I'll try to make a sketch on how this can be implemented inanother post.)

As a conclusion, I would say that APIs should strive for thefollowing principles, in this order:

1. Avoid allocation altogether, for example by laziness (ranges),or by accepting sinks.

2. If allocations are necessary (or desirable, to make the APImore easily usable), try hard to return a unique value (this ofcourse needs to be expressed in the return type).

3. If both of the above fails, only then return a GCed pointer,or alternatively provide several variants of the function (thoughthis shouldn't be necessary often). An interesting alternative:Instead of passing a flag directly describing the policy, passthe function a type that it should wrap it's return value in.

As for the _allocation_ strategy: It indeed needs to beconfigurable, but here, the same objections against a templateparameter apply. As the allocator doesn't necessarily need to bepart of the type, a (thread) global variable can be used tospecify it. This lends itself well to idioms like


    with(MyAllocator alloc) {
        // ...
    }


Destroy!


Done :-)

Re: RFC: moving forward with @nogc Phobos

Reply via email to