On Wed, 29 Dec 2010 12:27:02 -0700, Steven Schveighoffer <schvei...@yahoo.com> wrote:
On Wed, 29 Dec 2010 14:00:17 -0500, Robert Jacques <sandf...@jhu.edu> wrote:
On Wed, 29 Dec 2010 07:37:10 -0700, Steven Schveighoffer <schvei...@yahoo.com> wrote:
On Tue, 28 Dec 2010 01:23:22 -0500, Robert Jacques <sandf...@jhu.edu> wrote:
Second, the false pointer problem disappears (for practical purposes) when you move to 64-bit.

I'm not sure I like this "solution", but you are correct. This is somewhat mitigated however by the way memory is allocated (I'm assuming not sparsely throughout the address space, and also low in the address space). It certainly makes it less likely that a 64-bit random long points at data, but it's not inconceivable to have 32-bits of 0 interspersed with non-zero data. It might be likely to have a struct with two ints back to back, where one int is frequently 0.

Ah, but the GC can allocate ram in any section of the address space it wants, so it would be easy for the upper 32-bits to be always non-zero by design.

huh? How does the GC control whether you set one int to 0 and the other not?

Hmm, perhaps an example would be best. Consider a 64-bit thread-local GC. It might allocate ram in the following pattern:
[2-bit Shared/Local][16-bit thread ID][6-bit tag][40-bit pointer]
So first, the address space is divided up into 4 regions, [11...] and [00...] are left free for user use (external allocation, memory-mapped files, etc), and [01...]/[10...] are used to denote shared/thread-local. Next you have the thread ID, which is one way to support thread-local allocation/thread-local GCs. Then you might have a 6-bit region that lock-free algorithms could use as a tag (and would be ignored by the shared GC), or local GCs could use for internal purposes. Finally, you have a 40-bit region with what we normally think of as a 'pointer'.

The thing to understand, is that 64-bit computers are really 40-bit computers, currently. And given that 40-bits will hold us until we get to peta-bytes, we should have 24-bits to add meta-info to our pointers for some time to come. So, as long as we choose this meta-info carefully, we can avoid common bit patterns and the associated false pointers.


Third, modern GCs (i.e. thread-local GCs) can further reduce the false pointer issue.

I'd rather have precise scanning :) There are issues with thread-local GCs.

The only issue with thread-local GCs is that you can't cast to immutable and then shared the result across threads. And eventually, well have a) better ways of constructing immutable and b) a deep idup, to mitigate this.

I'm talking about collection cycles -- they necessarily need to scan both thread local and shared heaps because of the possibility of cross-heap pointing. Which means you gain very little for thread-local heaps. The only thing it buys you is you can assume the shared heap has no pointers into local heaps, and local heaps have no pointers into other local heaps. But if you can't run a collection on just one heap, there is no point in separating them.

The defining feature of thread-local GCs is that they _can_ run collection cycles independently from each other.

Plus it's easy to cast without moving the data, which is not undefined currently if you take the necessary precautions, but would cause large problems with separate heaps.

Casting from immutable/shared would be memory valid, it's casting to immutable and shared where movement has to occur. As casting between immutable/shared and mutable is logically invalid, I'd expect that these cases would be rare (once we solve the problem of safely constructing complex immutable data)


If we have the typeinfo of a memory block for the GC to parse, you can also rule out cross-thread pointers without thread-local GCs (for unshared data).

Which basically creates all the problems of thread-local GC, with very few of the advantages.

What are the advantages?  I'm not being sarcastic, I really don't know.

-Steve

The major advantage is that they match or out-perform the modern generational/concurrent GCs, even when backed by conservative mark-sweep collectors (according to Apple). (TLGCs can be backed by modern collectors) Essentially, TLGCs work by separately allocating and collecting objects which are not-shared between threads. Since these collectors don't have to be thread safe, they can be more efficient in their implementation, and collections only have to deal with their subset of the heap and their own stack. This reduces pause times and false pointers, etc. TLGCs are inherently parallel and have interleaved pause times, which can greatly reduce the "embarrassing pause" effect (i.e. your worker thread running out of ram won't pause your GUI, or stop other workers from fulfilling http requests).

In D, they become even more important, since to the best of my knowledge D can never support generational GCs and only supports concurrent GCs on certain OSes. So, if D wants a competitive GC, a thread-local GC is the only cross-platform option I know of. (C function calls prevent most generational/concurrent GC algorithms from being applicable to D)

Reply via email to