The slides make perfect sense. As he says, the open question is what to do about it. If someone can write a relatively simple patch to improve the behavior, with a test to make sure it stays improved, I think it would have a very good chance of getting accepted into CPython. A complex patch would have less chance because of Jesse's answer. :) Unladen Swallow (http://code.google.com/p/unladen-swallow/source/browse/tests/perf.py) would accept a benchmark just measuring the problem even without a suggestion to improve it.
Here's a discussion that may illustrate why fixing this is tough: * On a multicore machine, a waiting thread has to do some amount of work to wake up. A reasonable ballpark is ~1us. It makes sense to let the foreground thread continue making progress while the background thread is waking up, especially since the OS may not choose to wake up a Python thread first. So we let the foreground thread re-acquire the GIL immediately after releasing it, in the hope that it can get a couple more checks in before the background thread actually wakes up. BUT, we don't really want to let it continue running after the waiting thread does wake up, so perhaps we should have the waiting thread set a flag when it does wake up which forces the foreground thread to sleep asap. Then the waiting thread has to wait for the GIL again, but we DON'T want it to hand control back to the OS or we would have wasted that waking-up time. So maybe we have it spin-wait. But what happens if the OS has actually swapped out the foreground thread for another process? Then we waste lots of time. I don't know of any OSes that give us a way to do something when a thread gets swapped out. They don't even let another thread check whether a given thread is currently running. * On a single core, any time the foreground thread spends executing after signaling a waiting thread is time the waiting thread can't use to wake up. So it makes sense to force a context switch to a particular waiting thread. This is actually pretty easy: instead of a GIL, we have a binary semaphore per thread that gets upped to instruct a particular thread to run, and then the previously-running thread immediately waits on its own semaphore. The issue here is just the time it takes to switch threads: ~1us. The GIL checks are currently every 100 ticks (every couple opcodes), which means that in arithmetic-heavy code those checks occur on the order of every microsecond too. You don't want to spend half of your time switching threads. On the other hand, as Dave pointed out, sometimes even 100 ticks isn't soon enough. I think we could solve this by checking the elapsed time on each "check" rather than unconditionally switching threads, but we might want to do something to give I/O-bound threads higher priority. Anyway, I'm not likely to work on this any time soon, but I'm happy to review any patches someone else produces. :) On Fri, Jun 12, 2009 at 8:16 AM, Pete<[email protected]> wrote: > I didn't attend last night's UG, but I saw Dave give a version of this talk > about a month ago. I'll second Carl's opinion - this talk is of critical > importance to anyone using threads in Python. > > Begin forwarded message: > >> From: Carl Karsten <[email protected]> >> Date: June 12, 2009 10:51:33 AM EDT >> To: The Chicago Python Users Group <[email protected]> >> Subject: Re: [Chicago] Posted : Video >> >> * David Beazley: mind-blowing presentation about how the Python GIL >> actually works and why it's even worse than most people even imagine. >> http://blip.tv/file/2232410 http://www.dabeaz.com/python/GIL.pdf > _______________________________________________ concurrency-sig mailing list [email protected] http://mail.python.org/mailman/listinfo/concurrency-sig
