I've studied that refcount paper and some other recent gc algorithms.
http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2006/PHD/PHD-2006-10.ps
combines a number of gc algorithms into something that's a whole lot
more efficient than what pike currently uses. Some highlights:
o The reference counts aren't updated for references on the stack.
The stacks are scanned when the gc runs instead. This saves many
many updates, and it also simplifies C level programming a lot.
Only refcounts between things on the heap are counted.
o The refcounts are only updated when the gc runs. This saves a lot
of the remaining updates since if a pointer starts with value p_0
and then changes to p_1, then p_2, p_3, ..., and lastly to p_n at
the next gc, then only p_0->refs needs to be decremented and
p_n->refs needs to be incremented - all the refcount changes for
the other things it has pointed to cancel out.
o The above is accomplished by thread local logging, to make the old
p_0 value available to the gc at the next run. This means it scales
well with many cpu's.
o A generational gc uses refcounting only for old things in the heap.
New things, which are typically very short-lived, aren't refcounted
at all but instead gc'ed using a mark-and-sweep collector. This is
shown to be more effective for short-lived data, and it handles
cyclic structures without any extra effort.
o By using refcounting on old data, the gc only need to give
attention to refcounts that gets down to zero. This means the heap
can scale to any size without affecting the gc run time, as opposed
to using a mark-and-sweep collector on the whole heap. The gc time
scales only with the amount of _change_ in the heap.
o Cyclic structures in the old refcounted data that is handled
incrementally using the fact that a cyclic structure can only occur
when a refcounter is decremented to a value greater than zero.
Those things can therefore be tracked and cycle checked in the
background. The gc uses several different methods to weed out false
alarms before doing actual cycle checks.
o The gc runs entirely in its own thread. It only needs to stop the
working threads for a very short time to scan stacks etc, and they
can be stopped one at a time.
o Since all the things that need gc attention are logged on change,
there's no need for the double-linked lists.
All the above taken together puts this gc in a whole different league
compared to the one pike currently uses, so it'd be very nice indeed
to replace the old one, and it would simplify a lot of things in the
multi-cpu scenario. There are however some significant drawbacks:
o Noncyclic things are no longer destructed and freed immediately. In
particular, it means code like this no longer would work:
void foo() {
Thread.MutexKey my_lock = my_mutex->lock();
... do some work ...
// my_lock falls out of scope here when the function exits, so
// the lock is released right away.
}
Mutex locks like the above would need something like a synchronized
statement to work well (the locks must be known to the compiler so
that they can be destructed properly also when exceptions are
thrown - using explicit catch{} and rethrow everywhere would be too
cumbersome).
There's also code that opens files and sockets etc and expects them
to be automatically closed again through this method. (That
practice has been shown to be bug prone, though, so in our sources
many of those places have been fixed over time.)
Anyway, bottom line is that it would need pike level fixing. I
think the compiler can be changed to produce warnings for most
occurrences.
And there might be more immediate-destruct tricks out there too..
o The ubiquitous foo->refs would no longer be useful to anyone other
than the gc. This means the _refs() efun becomes meaningless. Afaik
it's used occasionally for some destruct tricks, but I think weak
refs can be used instead in most cases.
o _next(), _prev() and next_object() would also be defunct. It's a
minor problem - other methods to let pike code walk the heap can be
added.
o The C module interface becomes more seriously incompatible.
For starters, all code that fiddle with refcounts on the stack
become meaningless. I think most of it can be taken care of by
changing the various macros, e.g. ref_push_string would just be an
alias for push_string, and pop_stack would simply decrement
Pike_sp.
More problematic is that all code that change pointers in heap
objects would have to use new macros/functions. Above all, this
includes all C modules that fiddles with pointers in their current
storage (e.g. through a THIS macro or similar). Changes would be
trivial but widespread, and I can't think of any way to keep source
level compatibility with old modules.
So what do you think?