Chris Wright wrote:
On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:

To start off, let's talk terminology. You seem to be using nonstandard
terminology and possibly misunderstanding standard terminology.

A GC scan is the mark phase of a mark/sweep collector (and specifically
the part where the GC examines roots and allocated memory, if that
matters). The GC searches for pointers to GC memory. Simple GCs only need
to record whether there is a pointer to a given allocation, not where
those pointers are coming from.

A conservative GC is one that will encounter some things that aren't
pointers but must be scanned as if they were pointers. In D, for
instance, a union between a pointer and a size_t forces collectors to be
conservative.

A false pointer is something that a conservative GC must treat as a
pointer and appears to point to an object allocated by that GC. For
instance, unions in D allow us to create false pointers.

A precise GC is a GC that never mistakes a non-pointer for a pointer.

A moving GC is a GC that can relocate objects.

A generational GC is one that can perform a collection on part of the
heap without dealing with the full heap. Those parts tend to be organized
by how long an object has survived. This does not require a precise or
moving GC, though it works better with a precise, moving GC.


I didn't intend to confuse, but I was not writing to neophyte's, and I used linguistic shortcuts.

A write barrier is a way to record which sections of memory have been
altered since the last GC collection. This lets the GC scan only the
changed data -- useful to any GC, especially useful to generational ones.

A "partially moving" GC does not exist, as far as I know.


Yep, it's a Bad Idea.

Jeremy DeHaan wrote:
On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
If I may make a suggestion. The lock free work is unlikely to require
the entirety of GSoC. And the precise GC is the next most important
thing on your list and will have the biggest impact on GC performance.

Rainer has two different precise GC's in pull requests right now and
both are slower than the current one unless there are false pointers.
I would expect anything I come up with to largely act the same. The
reason I keep pushing for a generational garbage collector is because I
think it would be where we would see the most benefit in terms of
general performance.


I would contend that it's not quite that simple. Of course the precise
GC is slower, it's scanning more stuff.

Precise GCs are slower because they must reference some data telling them
which items on the stack and in allocations represent pointers and which
do not. This data takes up cache space, and having less cache for memory
being scanned slows things down. Correlating two pieces of data takes
extra time, too.


Yep the data is usually held with the object in a header. Hence the "scanning more stuff".

The hope with a precise GC is to avoid false pointers. A false pointer to
an object that holds references to many other objects can be costly, both
in GC time and in excess memory usage.

In D, we have a tiny bit of type information surfaced to the GC to reduce
the incidence of false pointers. We could additionally take advantage of
the fact that numbers tend to be small and people tend to use 4 to 8 byte
numbers: we could arrange our address space to avoid ranges where false
pointers are likely. But that's probably unnecessary.


We are not limited by what we currently have. Most of Rainer's work involved surfacing more data to the GC.

The current conservative GC by
definition can't figure out large chunks of memory so it won't even
bother scanning them.

It must scan anything that might be a pointer. A precise GC reduces the
number of things that might be pointers but aren't to zero. Therefore a
precise GC scans less.


Yep, precise GC's aren't totally perf-negative.

If a conservative GC didn't scan something that might be a pointer, it
would sometimes free an object that still had live references to it. This
would result in memory corruption or segmentation faults.

What it does to those things it can't scan. And
what it can't scan it has to pin, which means the memory can't be moved.

If a moving GC cannot precisely identify all pointers to an object, it is
not allowed to move that object. If it can, it is allowed to move that
object.

A conservative GC can be a moving GC. It must classify things as
"definitely pointer", "maybe pointer", and "not pointer". It cannot move
an object with a "maybe pointer" pointing to it, but it can move an
object with several "definitely pointers" pointing to it.

In D, thanks to unions, we can never create a fully precise collector,
but we can create a moving collector. Objects pointed to from a union
can't be moved, but few objects will be pointed to from a union.


Actually, we probably could with modifications to the compiler. A little metadata goes a long way. And I know that Walter would accept that PR. I don't know for sure if it can be done, but I've seen precious little evidence one way or the other. Let's investigate before making absolute statements about what we can do. I know from discussions with Walter that the compiler's AST contains a wealth of information, and that more can be added.

You can't build a fully generational GC until you can move everything.

You can build a more efficient generational GC if you can move objects.
However, it's not impossible to build a non-moving generational GC.
Instead of having large blocks of address space for each generation, you
would have each Pool indicate its generation. Then you have to find the
Pool for a pointer and check that flag.

That's comparatively slow, but it'd still be faster to do a Gen 0 scan
that way than to do a full scan with D's current collector (assuming you
implement write barriers).


My apologies I didn't intend to state that it was impossible. I was attempting to emphatically question the wisdom of doing so, particular when the potential performance benefits are dubious at best. My point is that we should do this right, if we keep blindly pursuing performance without taking a considered approach to achieving it, we'll never get there. The GC will just become an ever larger pile of hacks that increase performance in specific cases.

They talk about this problem in the books. However, once you have a
precise GC, a fully moving GC becomes possible and only then do the
performance benefits of a generational GC become realized.

The performance benefit of a moving and generational GC over a non-moving
generational GC is that you can have a contiguous section of address
space for each generation, allocated densely. You get this with less
overhead.


That's actually pretty important. Precision enables moving, moving has clear and enumerable performance benefits. QED.

However, a moving GC can be a pessimization for user code. If you
allocate several objects in sequence, with many GCs you'll get memory
allocated for them that is contiguous or nearly contiguous. For small
objects, this is probably going to be on the same page of memory. But a
moving GC is free to move them to different pages. So you need to
exercise caution.


Yes ... and this is also dealt with in the books, including basic algorithms to help combat these problems. Additionally, as evidenced by the fact that every major GC (JVM, CLR, et al.) system implements a moving GC, I'd argue that the benefits greatly outweigh any potential risks. And we have pull requests to further optimize the algorithms. We don't have a system at all yet, let's build it first, then we can work on optimizing it.

I have concerns about doing any compaction
though, mostly because D can have both references and pointers to
objects, and I don't know how we would want to go about updating
pointers. Especially since pointers to D objects can exists in C and
C++
code.


Erm, a generational GC is, in practice, a moving GC as it must move the
blobs around from the various heaps.

The Boehm collector is generational, conservative, and not moving. You
probably conflate "moving" and "generational" because of the JVM.


Ok, yes, but lets be honest, it's what we actually want, and more importantly, expect. Precision makes it happen, efficiently. Even the C++ community doesn't take the Boehm GC very seriously. We have an opportunity to prove the C-style language can have a performant GC.

As for the C/C++ code problem various solutions have been proposed here.
For one, D knows the difference between the two, so pinning is possible
there.

There's nothing in the type system that says whether you can pass a
variable to a C/C++ function. So, no, D *doesn't* know the difference
between them. And there's no GC.pin function, so people aren't currently
telling your GC when something's accessible to C/C++ code.


D does differentiate between a GC reference and pointer. Pin the pointers. Yes, you can create get pointer from a reference, and doing so is a pinning operation. We'd have to add a GC.pin function, so long as the compiler emits it for us. :)

That means a moving collector in D will produce segmentation faults and
memory corruption in existing, currently working code.

AFIAK each OS allows you to specify multiple heaps

You get one address space. You get to use mmap to ask for memory at a
given address. There's no heap_t data structure to provide to malloc or
mmap.

You might have been confused by literature referring to multiple heaps.
That's just a single allocator deciding to divide its address space
logically. There's no OS-level enforcement.


Windows has that concept via HeapCreate shown here: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366599(v=vs.85).aspx I'd be surprised if it's not available on other OS's, and yes, it is OS enforced. It might not work exactly how you like it, but that's a far cry from "not provided".

A concurrent GC without precision is mostly useless

The Boehm collector is not precise. It is concurrent. Its concurrency
reduces the amount of time a GC collection takes. This is mostly
orthogonal to the benefits of a precise collector in reducing the amount
of memory a process uses.

because the GC roots
are primarily derived from the stack which the thread was spawned and
the stack roots are not scanned in a conservative GC.

They *are* scanned in a conservative GC. If you don't scan memory that
might contain pointers (such as the stack), your GC will inevitably cause
segmentation faults and memory corruption. D's GC is conservative and
does not cause segmentation faults or memory corruption, therefore it
must scan the stack.


I may have been confusing any precise GC with Rainer's precise GC which only did precise Heap scans. Apologies for the confusion.

I will admit that I've done better work with my wording than what is in my last message. However, my fundamental point is sound, precision has benefits across all GC workloads and enables scenarios beyond the GC itself. Extending the compiler can work around many of the issues with false pointers.

And let's engage the elephant in the room. One of Rainer's main learning's was the precision permeates the compiler and GC, you have to deal with it everywhere. So you end up rebuilding the GC from the ground up when you add it. What you are proposing is building a conservative-generational GC that will have to be jettisoned and rewritten *again* once precision is added for a performance boost that, by your own admission, may not actually happen. How is this a good use of time?

My statement stands. Precision is the basis from which all high-order GC functions flow. Yes, you can work around not having it, but such workarounds tend to be just that. More importantly you are inevitably going to end up doing a half-implementation of precision anyways because each of the high-order functions need parts of that information.

I know precision isn't fun. But it's extremely important work that needs to get done. And once we have it, the world of GC algorithms opens before us. Until then we are intentionally limiting ourselves. One of the biggest complaints about D is the GC, fine, so instead of doing yet another hack-job on something that may not actually improve GC performance, let's do it right and lay the groundwork for a modern GC that allows us to really compete with JVM/CLR.

--
// Adam Wilson
// quiet.dlang.dev

Reply via email to