On Sunday, 4 December 2022 at 22:46:52 UTC, Ali Çehreli wrote:
That's way beyond my pay grade. Explain please. :)

The reason that the GC stops threads right now is to ensure that something doesn't change in the middle of its analysis.

Consider for example, the GC scans address 0 - 1000 and finds nothing. Then a running thread moves a reference from memory address 2200 down to address 800 while the GC is scanning 1000-2000.

Then the GC scans 2000-3000, where the object used to be, but it isn't there anymore... and the GC has no clue it needs to scan address 800 again. It, never having seen the object, thinks the object is just dead and frees it.

Then the thread tries to use the object, leading to a crash.

The current implementation prevents this by stopping all threads. If nothing is running, nothing can move objects around while the GC is trying to find them.

But, actually stopping everything requires 1) the GC knows which threads are there and has a way to stop them and 2) is overkill! All it really needs to do is prevent certain operations that might change the GC's analysis while it is running, like what happened in the example. It isn't important to stop numeric work, that won't change the GC. It isn't important to stop pointer reads (well not in D's gc anyway, there's some that do need to stop this) so it doesn't need to stop them either.

Since what the GC cares about are pointer locations, it is possible to hook that specifically, which we call write barriers; they either block pointer writes or at least notify the GC about them. (And btw not all pointer writes need to be blocked either, just ones that would point to a different memory block. So things like slice iterations can also be allowed to continue. More on my blog http://dpldocs.info/this-week-in-d/Blog.Posted_2022_10_31.html#thoughts-on-pointer-barriers )

So what happens then:


GC scans address 0 - 1000 and finds nothing.

Then a running thread moves a reference from memory address 2200 down to address 800... which would trigger the write barrier. The thread isn't allowed to complete this operation until the GC is done. Notice that the GC didn't have to know about this thread ahead of time, since the running thread is responsible for communicating its intentions to the GC as it happens. (Essentially, the GC holds a mutex and all pointer writes in generated D code are synchronized on it, but there's various implementations.)

Then the GC scans 2000-3000, and the object is still there since the write is paused! It doesn't free it.

The GC finishes its work and releases the barriers. The thread now resumes and finishes the move, with the object still alive and well. No crash.

This would be a concurrent GC, not stopping threads that are doing self-contained work, but it would also be more compatible with external threads, since no matter what the thread, it'd use that gc mutex barrier.

Reply via email to