On Sun, May 19, 2002 at 10:43:20PM -0400, Mike Lambert wrote:
> I started to look into the GC crashes with the perlhash tests. I'm not
> sure I found the exact problem, but I found a bunch of dangerous things
> that were being done, and could possibly cause the GC problems.
> 
> restore_invariants is an ugly hack that is prone to error. You have to
> remember to restore_invariants after *any* header retrieval or memory
> allocation, or anything that could cause that. For example, take
> hash_delete. It restores invariants properly at the start of the
> procedure. But if you'll notice down below in the body, it uses
> string_compare. Due to transcoding issues, that can trigger a full GC,
> which means you need to restore_invariants inside that loop. And that
> invalidates your 'chain' which you've already placed on the local C stack,
> and restore_invariants can't fix that.
> 
> Same thing with hash_put, which calls find_bucket, which also calls
> string_compare, with all the same problems. And same deal with hash_get,
> which calls hash_lookup, which calls find_bucket, etc.
> 
> I think you should go for an index-based approach to the hashtable
> problem, to make this work nice and easily. If your concerned about code
> clarity, I'd be happy to try and go in and attempt to clean things up.
> Probably just needs a little macro usage to abstract out your hashtable
> futzoring. (Just one level of macros, no need to worry. ;)

Damn. Yep, you're right. I was mistakenly thinking I could get away
with read-only access. But as you point out, doing anything with
STRINGs can trigger allocations (and therefore collections.)

Ok, I'll finish off the original conversion to indexed access that I
began once, before giving up in disgust. The problem is not just that
you have to use indices instead of pointers; it's also that you have
to constantly go back to the buffer header before you can get
anywhere. That needs to be hidden by a macro or (my preference) an
inline function, and slows down the common case. Also, you lose the
clean sentinel value NULL (index 0 is definitely valid; index -1
introduces signedness problems.)

Let me know if you've already started a rewrite, though, so I don't
just redo it.

Something about the whole setup just feels wrong. GC shouldn't be this
intrusive. And it definitely shouldn't slow down the common case by
making the compiler defensively reload a buffer pointer every few
instructions (it'll be cached, but it fouls up reordering.)

Is there some way to make the default be that things will not get
moved around, and allow routines to volunteer to have their guts
scrambled if they know how to handle it?

Or coming from a completely different direction, do it through
exception handling and restartable atomic sequences: if something runs
out of memory during its execution it throws an exception. The caller
would then do the GC and retry the routine. As with any other
exception, the routine has to be able to restore itself to a
consistent state in the presence of an exception. (Or in other words:
it's complicated, but we'll already be paying most of the price by
supporting exceptions.)

Reply via email to