Re: On heap segregation, GC optimization and @nogc relaxing

Rikki Cattermole via Digitalmars-d Tue, 11 Nov 2014 19:16:37 -0800

On 12/11/2014 3:34 p.m., deadalnix wrote:

Hi all,


I want to get back on the subject of ownership, lifetime and propose
some solution, but before, propose to state the problem in a way that
haven't seen before (even if I have no doubt some have came to the same
conclusion in the past).

The problem at hand is double: memory management and thread safety.
Number one has been a hot topic for ages, and number 2 has become very
over the past years, to the widespreading of multicores CPU.

The problem at hand here is ownership of data. There are 3 roads you can
go about it:
  - immutability and GC. Effectively, these 2 technique allow you to get
rid of ownership. There are advantages and drawbacks i'm going to
discuss later.
  - Being unsafe and rely on convention. This is the C++ road (and a
possible road in D). It allow to implement almost any wanted scheme, but
come at great cost for the developer.
  - Annotations. This is the Rust road. It also come a great cost for
the developer, as some schemes may be non trivial to express granted the
type system, but, contrary to the C++ road, is safe.

These approach all have some very nice things going on for them, but
also some killer scenarios.

Immutability+GC allow to have safety while keeping interfaces simple.
That is of great value. It also come with some nice goodies, in the
sense that is it easy and safe to shared data without bookkeeping,
allowing one to fit more in cache, and reduce the amount of garbage
created. Most text processing apps fall into this category and this is
why D is that good at them. Another big goodies is that many lock free
algorithm become possible. Once you remove the need for bookkeeping of
ownership many operations can be implemented in an atomic manner.
Additionally, it is possible to implement various GC optimization on
immutable heap, which make the GC generally more efficient. But the cost
is also real. For some use case, this mean having a large amount of
garbage generated (Carmack wrote a piece on haskell were he mention the
disastrous effect that having a framebuffer immutable would have: you'd
have to clone it everytime you draw in it, which is a no go). GC also
tend to cause unpredictable runtime characteristics, which programs with
real time constraint can have hard time to deal with.

Relying on convention has the advantage that any scheme can be
implemented without constraint, while keeping interface simple. The
obvious drawback is that it is time consuming and error prone. It also
make a lot of things unclear, and dev choose the better safe than sorry
road. That mean excessive copying to make sure one own the data, which
is wasteful (in term of work for the copy itself, garbage generation and
cache pressure). If this must be an option locally for system code, it
doesn't seems like this is the right option at program scale and we do
it in C++ simply because we have to.

Finally, annotations are a great way to combine safety and speed, but
generally come at a great cost when implenting uncommon ownership
strategies where you ends up having to express complex lifetime and
ownership relations.

Ideally, we want to map with what the hardware does. So what does the
hardware do ?

Multicore CPU have various cores, each of them having layers of cache.
Cache is organized in cache line and each cache line can be in various
modes. Actual system are quite complex and deal with problems we are not
very interesting here (like writeback) but the general idea is that
every cache line is owned with different modes.

Either the cache line is owned by a single core and can be written to,
or the cache line shared by several cores, each of them having a local
copy of the line, but none of them can write to. There is an internal
bus where cores can exchange cache line with each other and messages to
acquire cache line in read or read/write mode. That mean CPU are good at
thread local read/write, shared immutable and transfer of ownership from
one core to the other. They are bad at shared writable data (as
effectively, the cache line will have to bounce back and forth between
cores, and all memory access will need to be serialized instead of
performed out of order).

In that world, D has a bizaro position were it use a combination of
annotations (immutable, shared) and GC. Ultimately, this is a good
solution. Using annotation for common cases, fallback on GC/unsafe code
when these annotations fall short.

Before going into why it is fallign short, a digression on GC and the
benefits of segregating the heap. In D, the heap is almost segregated in
3 groups: thread local, shared and immutable. These group are very
interesting for the GC:
  - Thread local heap can be collected while disturbing only one thread.
It should be possible to use different strategy in different threads.
  - Immutable heap can be collected 100% concurrently without any
synchronization with the program.
  - Shared heap is the only one that require disturbing the whole
program, but as a matter of good practice, this heap should be small
anyway.

Various ML family languages (like OCaml) have adopted segregated heap
strategy and get great benefice out of it. For instance, OCaml's GC is
known to outperform Java's in most scenarios.

We are sitting on a huge GC goldmine here, but 3 things prevent us to
exploit it:
  - Exceptions. They can bubble from one thread to the other and create
implicit sharing.
  - Uniqueness (as it is defined now) as it allow for unique object to
be merged with any heap.
  - message passing. Ownership transfert is not possible and so unsafe
casting ensue.

  * It has to be noted that delegate allow as well for this kind of
stunt, but this is recognized as a bug by now and hopefully it is gonna
be fixed.

D has a type qualifier system for which we pay a big price. Getting
everything const correct is difficult. We'd want to get the most bang
for the buck. One of the bang we are not far to be able to get is
segregating the heap. That mean shitty GC and unsafe code.

Let's present a concrete exemple using ownership:
pure Object foo() { ... }
immutable o = foo();

This is valid code. However, foo can do arbitrary manipulation to come
up with the object. These include various allocations. These allocation
are mutable into foo, which makes it impossible to allocate them on the
immutable heap (as a GC relying on this immutability could mess up
things pretty bad). They also cannot be allocated on the TL heap as once
promoted to immutable, the data become shared as well.

On the other hand, ownership means that the compiler can know when
things go out of scope and free them explicitly. Which is a plus as
generating less garbage is always a way to improve garbage collection.
The most efficient work there is is the one that do not need to be done.

I'd argue for the introduction of a basic ownership system. Something
much simpler than rust's, that do not cover all uses cases. But the good
thing is that we can fallback on GC or unsafe code when the system show
its limits. That mean we rely less on the GC, while being able to
provide a better GC.

We already pay a cost at interface with type qualifier, let's make the
best of it ! I'm proposing to introduce a new type qualifier for owned
data.

Now it means that throw statement expect a owned(Throwable), that pure
function that currently return an implicitly unique object will return
owned(Object) and that message passing will accept to pass around owned
stuff.

The GC heap can be segregated into island. We currently have 3 types of
islands : Thread local, shared and immutable. These are builtin island
with special characteristics in the language. The new qualifier
introduce a new type of island, the owned island.

owned island can only refers to other owned island and to immutable.
they can be merged in any other island at any time (that is why they
can't refers to TL or shared).

owned(T) can be passed around as function parameter or returned, or
stored as fields. When doing so they are consumed. When an owned is not
consumed and goes out of scope, the whole island is freed.

That means that owned(T) can implicitly decay into T, immutable(T),
shared(T) at any time. When doing so, a call to the runtime is done to
merge the owned island to the corresponding island. It is passed around
as owned, then the ownership is transferred and all local references to
the island are invalidated (using them is an error).

On an implementation level, a call to a pure function that return an
owned could look like this :

{
   IslandID __saved = gc_switch_new_island();
   scope(exit) gc_restore_island(__saved);

   call_pure_function();
}

This allow us to rely much less on the GC and allow for a better GC
implementation.

@nogc . Remember ? It was in the title. What does a @nogc function look
like ? a no gc function o not produce any garbage or trigger the
collection cycle. there is no reason per se to prevent the @nogc code to
allocate on the GC as long as you know it won't produce garbage. That
mean the only operation you need to ban are the one that merge the owned
things into TL, shared or immutable heap.

This solves the problem of the @nogc + Exception. As Exception are
isolated, they can be allocated, throw and catched into @nogc code
without generating garbage. They can safely bubble out of the @nogc
section of the code and still be safe.

The same way, it open the door for a LOT of code that is not @nogc to
be. If the code allocate memory in an owned island and return it, then
it is now up to the caller to decide whether is want's it garbage
collected or keep it as owned (and/or make it reference counted for
instance).

The solution of passing a policy at compile for allocation is close to
what C++'s stdlib is doing, and even if the proposed approach by Andrei
is better, I don't think this is a good one. The proposed approach allow
for a lot of code to be marked as @nogc and allow for the caller to
decide. That is ultimately what we want libraries to look like.


Humm.

import std.stdio;

struct TypeOwnerShip(T) {
        T value;
        alias value this;

        this(T)(T value) {
                this.value = value;     
        }
        
        // implict casts to immutable, shared?
        // on cast to immutable, shared change islands
}

T owned(T)(T value) {
        return TypeOwnerShip!T(value);  
}


class Bar {
        int x;
        
        this(int x) pure {
                this.x = x;     
        }
}

Bar foo() pure {
        return owned(new Bar(5));
}

struct IslandAllocationStrategy {
        this(ubyte v = 0) {
        }
        
        void opWithIn() {
                writeln("opWithIn");
                // thread local overriding
        }
        
        void opWithOut() {
                import std.stdio;
                writeln("opWithOut");
                // reset thread local overriding
        }
}

@property IslandAllocationStrategy island() {
        return IslandAllocationStrategy();
}

void main() {
        writeln("start");
        with(island) {
                opWithIn;
                writeln("{");
                
                Bar myValue = foo();
                writeln(myValue.x);
                
                
                writeln("}");
                opWithOut;
        }
        writeln("end");
}

I feel like I've suggested this, just without the CS theory.

Re: On heap segregation, GC optimization and @nogc relaxing

Reply via email to