Hi Neil,

> 1. Embarassingly - given that I already said "Nice fix" to this - I'm
> afraid I can't now see exactly why this is needed.

Argh, you're right -- when I first noticed this behavior, I was so
astonished to see my logs showing threads entering and leaving guile
mode during GC that my first move was to try and prevent this.  When
my changes got rid of this behavior, I assumed everything was
hunky-dory.  However, when pressed to explain the details, I added
more logging, which showed that errant thread ultimately did go to
sleep at the proper time -- it just never woke up when the
wake_up_cond was broadcast on.

My current inclination is that the problem lies with sleeping on the
global wake_up_cond -- each thread calls pthread_cond_wait with its
own, thread-specific heap_mutex, the result of which is undefined, or
so say the glibc docs.  I'm testing a fix now that uses a mutex
reserved for this purpose instead.

So why hasn't this been reported before?  I'm not really sure, except
that based on  my logs, a GC involving more than two threads (one
thread stays awake, of course, to manage the collection) is kind of
rare.  It doesn't even necessarily happen during an entire run of my
SRFI-18 test suite, which lasts for several seconds and is fairly
multi-threaded.


> Is that right?  I think you suggested in one of your previous emails
> that it might be possible for thread A to enter and leave guile mode
> multiple times, but I don't see how that is possible.

It *is* possible, because a thread can enter and leave guile mode and
do a fair number of things without SCM_TICK getting called.  I don't
know if that's significant or not.


> 2. Should admin_mutex be locked in scm_c_thread_exited_p()?  I think
> it should.  (This was equally wrong when using thread_admin_mutex, of
> course; your patch doesn't make anything worse, but it's worth fixing
> this in passing if you agree.)

Sure -- wouldn't hurt.  I'll include that with whatever ends up in the
final "bug" patch.

Apologies that it takes me so long to reply to these emails.  Blame
the overhead of looping my test code all night?


Regards,
Julian


Reply via email to