Hi Neil, > 1. Embarassingly - given that I already said "Nice fix" to this - I'm > afraid I can't now see exactly why this is needed.
Argh, you're right -- when I first noticed this behavior, I was so astonished to see my logs showing threads entering and leaving guile mode during GC that my first move was to try and prevent this. When my changes got rid of this behavior, I assumed everything was hunky-dory. However, when pressed to explain the details, I added more logging, which showed that errant thread ultimately did go to sleep at the proper time -- it just never woke up when the wake_up_cond was broadcast on. My current inclination is that the problem lies with sleeping on the global wake_up_cond -- each thread calls pthread_cond_wait with its own, thread-specific heap_mutex, the result of which is undefined, or so say the glibc docs. I'm testing a fix now that uses a mutex reserved for this purpose instead. So why hasn't this been reported before? I'm not really sure, except that based on my logs, a GC involving more than two threads (one thread stays awake, of course, to manage the collection) is kind of rare. It doesn't even necessarily happen during an entire run of my SRFI-18 test suite, which lasts for several seconds and is fairly multi-threaded. > Is that right? I think you suggested in one of your previous emails > that it might be possible for thread A to enter and leave guile mode > multiple times, but I don't see how that is possible. It *is* possible, because a thread can enter and leave guile mode and do a fair number of things without SCM_TICK getting called. I don't know if that's significant or not. > 2. Should admin_mutex be locked in scm_c_thread_exited_p()? I think > it should. (This was equally wrong when using thread_admin_mutex, of > course; your patch doesn't make anything worse, but it's worth fixing > this in passing if you agree.) Sure -- wouldn't hurt. I'll include that with whatever ends up in the final "bug" patch. Apologies that it takes me so long to reply to these emails. Blame the overhead of looping my test code all night? Regards, Julian