Chris Wright wrote:
On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:
To start off, let's talk terminology. You seem to be using nonstandard
terminology and possibly misunderstanding standard terminology.
A GC scan is the mark phase of a mark/sweep collector (and specifically
the part where the GC examines roots and allocated memory, if that
matters). The GC searches for pointers to GC memory. Simple GCs only need
to record whether there is a pointer to a given allocation, not where
those pointers are coming from.
A conservative GC is one that will encounter some things that aren't
pointers but must be scanned as if they were pointers. In D, for
instance, a union between a pointer and a size_t forces collectors to be
conservative.
A false pointer is something that a conservative GC must treat as a
pointer and appears to point to an object allocated by that GC. For
instance, unions in D allow us to create false pointers.
A precise GC is a GC that never mistakes a non-pointer for a pointer.
A moving GC is a GC that can relocate objects.
A generational GC is one that can perform a collection on part of the
heap without dealing with the full heap. Those parts tend to be organized
by how long an object has survived. This does not require a precise or
moving GC, though it works better with a precise, moving GC.
I didn't intend to confuse, but I was not writing to neophyte's, and I
used linguistic shortcuts.
A write barrier is a way to record which sections of memory have been
altered since the last GC collection. This lets the GC scan only the
changed data -- useful to any GC, especially useful to generational ones.
A "partially moving" GC does not exist, as far as I know.
Yep, it's a Bad Idea.
Jeremy DeHaan wrote:
On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
If I may make a suggestion. The lock free work is unlikely to require
the entirety of GSoC. And the precise GC is the next most important
thing on your list and will have the biggest impact on GC performance.
Rainer has two different precise GC's in pull requests right now and
both are slower than the current one unless there are false pointers.
I would expect anything I come up with to largely act the same. The
reason I keep pushing for a generational garbage collector is because I
think it would be where we would see the most benefit in terms of
general performance.
I would contend that it's not quite that simple. Of course the precise
GC is slower, it's scanning more stuff.
Precise GCs are slower because they must reference some data telling them
which items on the stack and in allocations represent pointers and which
do not. This data takes up cache space, and having less cache for memory
being scanned slows things down. Correlating two pieces of data takes
extra time, too.
Yep the data is usually held with the object in a header. Hence the
"scanning more stuff".
The hope with a precise GC is to avoid false pointers. A false pointer to
an object that holds references to many other objects can be costly, both
in GC time and in excess memory usage.
In D, we have a tiny bit of type information surfaced to the GC to reduce
the incidence of false pointers. We could additionally take advantage of
the fact that numbers tend to be small and people tend to use 4 to 8 byte
numbers: we could arrange our address space to avoid ranges where false
pointers are likely. But that's probably unnecessary.
We are not limited by what we currently have. Most of Rainer's work
involved surfacing more data to the GC.
The current conservative GC by
definition can't figure out large chunks of memory so it won't even
bother scanning them.
It must scan anything that might be a pointer. A precise GC reduces the
number of things that might be pointers but aren't to zero. Therefore a
precise GC scans less.
Yep, precise GC's aren't totally perf-negative.
If a conservative GC didn't scan something that might be a pointer, it
would sometimes free an object that still had live references to it. This
would result in memory corruption or segmentation faults.
What it does to those things it can't scan. And
what it can't scan it has to pin, which means the memory can't be moved.
If a moving GC cannot precisely identify all pointers to an object, it is
not allowed to move that object. If it can, it is allowed to move that
object.
A conservative GC can be a moving GC. It must classify things as
"definitely pointer", "maybe pointer", and "not pointer". It cannot move
an object with a "maybe pointer" pointing to it, but it can move an
object with several "definitely pointers" pointing to it.
In D, thanks to unions, we can never create a fully precise collector,
but we can create a moving collector. Objects pointed to from a union
can't be moved, but few objects will be pointed to from a union.
Actually, we probably could with modifications to the compiler. A little
metadata goes a long way. And I know that Walter would accept that PR. I
don't know for sure if it can be done, but I've seen precious little
evidence one way or the other. Let's investigate before making absolute
statements about what we can do. I know from discussions with Walter
that the compiler's AST contains a wealth of information, and that more
can be added.
You can't build a fully generational GC until you can move everything.
You can build a more efficient generational GC if you can move objects.
However, it's not impossible to build a non-moving generational GC.
Instead of having large blocks of address space for each generation, you
would have each Pool indicate its generation. Then you have to find the
Pool for a pointer and check that flag.
That's comparatively slow, but it'd still be faster to do a Gen 0 scan
that way than to do a full scan with D's current collector (assuming you
implement write barriers).
My apologies I didn't intend to state that it was impossible. I was
attempting to emphatically question the wisdom of doing so, particular
when the potential performance benefits are dubious at best. My point is
that we should do this right, if we keep blindly pursuing performance
without taking a considered approach to achieving it, we'll never get
there. The GC will just become an ever larger pile of hacks that
increase performance in specific cases.
They talk about this problem in the books. However, once you have a
precise GC, a fully moving GC becomes possible and only then do the
performance benefits of a generational GC become realized.
The performance benefit of a moving and generational GC over a non-moving
generational GC is that you can have a contiguous section of address
space for each generation, allocated densely. You get this with less
overhead.
That's actually pretty important. Precision enables moving, moving has
clear and enumerable performance benefits. QED.
However, a moving GC can be a pessimization for user code. If you
allocate several objects in sequence, with many GCs you'll get memory
allocated for them that is contiguous or nearly contiguous. For small
objects, this is probably going to be on the same page of memory. But a
moving GC is free to move them to different pages. So you need to
exercise caution.
Yes ... and this is also dealt with in the books, including basic
algorithms to help combat these problems. Additionally, as evidenced by
the fact that every major GC (JVM, CLR, et al.) system implements a
moving GC, I'd argue that the benefits greatly outweigh any potential
risks. And we have pull requests to further optimize the algorithms. We
don't have a system at all yet, let's build it first, then we can work
on optimizing it.
I have concerns about doing any compaction
though, mostly because D can have both references and pointers to
objects, and I don't know how we would want to go about updating
pointers. Especially since pointers to D objects can exists in C and
C++
code.
Erm, a generational GC is, in practice, a moving GC as it must move the
blobs around from the various heaps.
The Boehm collector is generational, conservative, and not moving. You
probably conflate "moving" and "generational" because of the JVM.
Ok, yes, but lets be honest, it's what we actually want, and more
importantly, expect. Precision makes it happen, efficiently. Even the
C++ community doesn't take the Boehm GC very seriously. We have an
opportunity to prove the C-style language can have a performant GC.
As for the C/C++ code problem various solutions have been proposed here.
For one, D knows the difference between the two, so pinning is possible
there.
There's nothing in the type system that says whether you can pass a
variable to a C/C++ function. So, no, D *doesn't* know the difference
between them. And there's no GC.pin function, so people aren't currently
telling your GC when something's accessible to C/C++ code.
D does differentiate between a GC reference and pointer. Pin the
pointers. Yes, you can create get pointer from a reference, and doing so
is a pinning operation. We'd have to add a GC.pin function, so long as
the compiler emits it for us. :)
That means a moving collector in D will produce segmentation faults and
memory corruption in existing, currently working code.
AFIAK each OS allows you to specify multiple heaps
You get one address space. You get to use mmap to ask for memory at a
given address. There's no heap_t data structure to provide to malloc or
mmap.
You might have been confused by literature referring to multiple heaps.
That's just a single allocator deciding to divide its address space
logically. There's no OS-level enforcement.
Windows has that concept via HeapCreate shown here:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366599(v=vs.85).aspx
I'd be surprised if it's not available on other OS's, and yes, it is OS
enforced. It might not work exactly how you like it, but that's a far
cry from "not provided".
A concurrent GC without precision is mostly useless
The Boehm collector is not precise. It is concurrent. Its concurrency
reduces the amount of time a GC collection takes. This is mostly
orthogonal to the benefits of a precise collector in reducing the amount
of memory a process uses.
because the GC roots
are primarily derived from the stack which the thread was spawned and
the stack roots are not scanned in a conservative GC.
They *are* scanned in a conservative GC. If you don't scan memory that
might contain pointers (such as the stack), your GC will inevitably cause
segmentation faults and memory corruption. D's GC is conservative and
does not cause segmentation faults or memory corruption, therefore it
must scan the stack.
I may have been confusing any precise GC with Rainer's precise GC which
only did precise Heap scans. Apologies for the confusion.
I will admit that I've done better work with my wording than what is in
my last message. However, my fundamental point is sound, precision has
benefits across all GC workloads and enables scenarios beyond the GC
itself. Extending the compiler can work around many of the issues with
false pointers.
And let's engage the elephant in the room. One of Rainer's main
learning's was the precision permeates the compiler and GC, you have to
deal with it everywhere. So you end up rebuilding the GC from the ground
up when you add it. What you are proposing is building a
conservative-generational GC that will have to be jettisoned and
rewritten *again* once precision is added for a performance boost that,
by your own admission, may not actually happen. How is this a good use
of time?
My statement stands. Precision is the basis from which all high-order GC
functions flow. Yes, you can work around not having it, but such
workarounds tend to be just that. More importantly you are inevitably
going to end up doing a half-implementation of precision anyways because
each of the high-order functions need parts of that information.
I know precision isn't fun. But it's extremely important work that needs
to get done. And once we have it, the world of GC algorithms opens
before us. Until then we are intentionally limiting ourselves. One of
the biggest complaints about D is the GC, fine, so instead of doing yet
another hack-job on something that may not actually improve GC
performance, let's do it right and lay the groundwork for a modern GC
that allows us to really compete with JVM/CLR.
--
// Adam Wilson
// quiet.dlang.dev