Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Greg Ewing [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: what if ... we use atomic test-and-set to handle reference counting (with a lock for those CPU architectures where we haven't written the necessary assembler fragment), then implement a lock for each mutable type and another for global state (thread state, interpreter state, etc)? Could be worth a try. A first step might be to just implement the atomic refcounting, and run that single-threaded to see if it has terribly bad effects on performance. I've done this experiment. It was about 12% on my box. Later, once I had everything else setup so I could run two threads simultaneously, I found much worse costs. All those literals become shared objects that create contention. I'm now working on an approach that writes out refcounts in batches to reduce contention. The initial cost is much higher, but it scales better too. I've currently got it to just under 50% cost, meaning two threads is a slight net gain. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Hrvoje Nikšić wrote: On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote: To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? If they drop to zero (which may happen due to non-atomic adjustments), Python will try to release the static memory, which will crash the malloc implementation. More precisely, Python will call the deallocator appropriate for the object type. If that deallocator does nothing, the object continues to live. Such objects could also start out with a refcount of sys.maxint or so to ensure that calls to the no-op deallocator are unlikely. The thought of adding references is amusing. What happens when a refcount becomes negative by overflow? I know, I should read the source ... regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Sorry, the dog ate my .sigline ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/14/07, Adam Olsen [EMAIL PROTECTED] wrote: Could be worth a try. A first step might be to just implement the atomic refcounting, and run that single-threaded to see if it has terribly bad effects on performance. I've done this experiment. It was about 12% on my box. Later, once I had everything else setup so I could run two threads simultaneously, I found much worse costs. All those literals become shared objects that create contention. It's hard to argue with cold hard facts when all we have is raw speculation. What do you think of a model where there is a global thread count that keeps track of how many threads reference an object? Then there are thread-specific reference counters for each object. When a thread's refcount goes to 0, it decrefs the object's thread count. If you did this right, hopefully there would only be cache updates when you update the thread count, which will only be when a thread first references an object and when it last references an object. I mentioned this idea earlier and it's growing on me. Since you've actually messed around with the code, do you think this would alleviate some of the contention issues? Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Thu, 2007-09-13 at 18:38 -0500, [EMAIL PROTECTED] wrote: Hrvoje More precisely, Python will call the deallocator appropriate for Hrvoje the object type. If that deallocator does nothing, the object Hrvoje continues to live. Such objects could also start out with a Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op Hrvoje deallocator are unlikely. Maybe sys.maxint/2? You'd hate for the first incref to invoke the deallocator even if it was a no-op. ob_refcnt is signed. :-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/14/07, Justin Tulloss [EMAIL PROTECTED] wrote: On 9/14/07, Adam Olsen [EMAIL PROTECTED] wrote: Could be worth a try. A first step might be to just implement the atomic refcounting, and run that single-threaded to see if it has terribly bad effects on performance. I've done this experiment. It was about 12% on my box. Later, once I had everything else setup so I could run two threads simultaneously, I found much worse costs. All those literals become shared objects that create contention. It's hard to argue with cold hard facts when all we have is raw speculation. What do you think of a model where there is a global thread count that keeps track of how many threads reference an object? Then there are thread-specific reference counters for each object. When a thread's refcount goes to 0, it decrefs the object's thread count. If you did this right, hopefully there would only be cache updates when you update the thread count, which will only be when a thread first references an object and when it last references an object. I mentioned this idea earlier and it's growing on me. Since you've actually messed around with the code, do you think this would alleviate some of the contention issues? There would be some poor worst-case behaviour. In the case of literals you'd start referencing them when you call a function, then stop when the function returns. Same for any shared datastructure. I think caching/buffering refcounts in general holds promise though. My current approach uses a crude hash table as a cache and only flushes when there's a collision or when the tracing GC starts up. So far I've only got about 50% of the normal performance, but that's with 90% or more scalability, and I'm hoping to keep improving it. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
At 1:51 AM -0500 9/14/07, Justin Tulloss wrote: On 9/14/07, Adam Olsen mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: Could be worth a try. A first step might be to just implement the atomic refcounting, and run that single-threaded to see if it has terribly bad effects on performance. I've done this experiment. It was about 12% on my box. Later, once I had everything else setup so I could run two threads simultaneously, I found much worse costs. All those literals become shared objects that create contention. It's hard to argue with cold hard facts when all we have is raw speculation. What do you think of a model where there is a global thread count that keeps track of how many threads reference an object? Then there are thread-specific reference counters for each object. When a thread's refcount goes to 0, it decrefs the object's thread count. If you did this right, hopefully there would only be cache updates when you update the thread count, which will only be when a thread first references an object and when it last references an object. It's likely that cache line contention is the issue, so don't glom all the different threads' refcount for an object into one vector. Keep each thread's refcounts in a per-thread vector of objects, so only that thread will cache that vector, or make refcounts so large that each will be in its own cache line (usu. 64 bytes, not too horrible for testing purposes). I don't know all what would be required for separate vectors of refcounts, but each object could contain its index into the vectors, which would all be the same size (Go Virtual Memory!). I mentioned this idea earlier and it's growing on me. Since you've actually messed around with the code, do you think this would alleviate some of the contention issues? Justin Your idea can be combined with the maxint/2 initial refcount for non-disposable objects, which should about eliminate thread-count updates for them. -- TonyN.:' mailto:[EMAIL PROTECTED] ' http://www.georgeanelson.com/___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Your idea can be combined with the maxint/2 initial refcount for non-disposable objects, which should about eliminate thread-count updates for them. -- I don't really like the maxint/2 idea because it requires us to differentiate between globals and everything else. Plus, it's a hack. I'd like a more elegant solution if possible. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote: Your idea can be combined with the maxint/2 initial refcount for non-disposable objects, which should about eliminate thread-count updates for them. -- I don't really like the maxint/2 idea because it requires us to differentiate between globals and everything else. Plus, it's a hack. I'd like a more elegant solution if possible. It's not really a solution either. If your program runs for a couple minutes and then exits, maybe it won't trigger some catastrophic behavior from this hack, but if you have a long running process then you're almost certain to be screwed over by this (it wouldn't even have to be *very* long running - a month or two could do it on a 32bit platform). Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Jean-Paul Calderone wrote: On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote: Your idea can be combined with the maxint/2 initial refcount for non-disposable objects, which should about eliminate thread-count updates for them. -- I don't really like the maxint/2 idea because it requires us to differentiate between globals and everything else. Plus, it's a hack. I'd like a more elegant solution if possible. It's not really a solution either. If your program runs for a couple minutes and then exits, maybe it won't trigger some catastrophic behavior from this hack, but if you have a long running process then you're almost certain to be screwed over by this (it wouldn't even have to be *very* long running - a month or two could do it on a 32bit platform). Could each class define the value to be added to or subtracted from the refcount? We'd only need a bit to store the value (since it would always be zero or one), but the execution time might increase quite a lot if there's no nifty way to conditionally add or subtract one. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Sorry, the dog ate my .sigline ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight [EMAIL PROTECTED] wrote: On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote: On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote: Your idea can be combined with the maxint/2 initial refcount for non-disposable objects, which should about eliminate thread-count updates for them. -- I don't really like the maxint/2 idea because it requires us to differentiate between globals and everything else. Plus, it's a hack. I'd like a more elegant solution if possible. It's not really a solution either. If your program runs for a couple minutes and then exits, maybe it won't trigger some catastrophic behavior from this hack, but if you have a long running process then you're almost certain to be screwed over by this (it wouldn't even have to be *very* long running - a month or two could do it on a 32bit platform). Not true: the refcount becoming 0 only calls a dealloc function.. For objects which are not deletable, the dealloc function should simply set the refcount back to maxint/2. Done. So, eg, replace the Py_FatalError in none_dealloc with an assignment to ob_refcnt? Good point, sounds like it could work (I'm pretty sure you know more about deallocation in CPython than I :). Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/14/07, Jean-Paul Calderone [EMAIL PROTECTED] wrote: On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight [EMAIL PROTECTED] wrote: On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote: On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote: Your idea can be combined with the maxint/2 initial refcount for non-disposable objects, which should about eliminate thread-count updates for them. -- I don't really like the maxint/2 idea because it requires us to differentiate between globals and everything else. Plus, it's a hack. I'd like a more elegant solution if possible. It's not really a solution either. If your program runs for a couple minutes and then exits, maybe it won't trigger some catastrophic behavior from this hack, but if you have a long running process then you're almost certain to be screwed over by this (it wouldn't even have to be *very* long running - a month or two could do it on a 32bit platform). Not true: the refcount becoming 0 only calls a dealloc function.. For objects which are not deletable, the dealloc function should simply set the refcount back to maxint/2. Done. So, eg, replace the Py_FatalError in none_dealloc with an assignment to ob_refcnt? Good point, sounds like it could work (I'm pretty sure you know more about deallocation in CPython than I :). As I've said, this is all moot. The cache coherence protocols on x86 means this will be nearly as slow as proper atomic refcounting, and will not scale if multiple threads regularly touch the object. My experience is that they will touch it regularly. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Justin Tulloss wrote: What do you think of a model where there is a global thread count that keeps track of how many threads reference an object? I've thought about that sort of thing before. The problem is how you keep track of how many threads reference an object, without introducing far more overhead than you're trying to eliminate. Then there are thread-specific reference counters for each object. What happens when a new thread comes into existence? Do you go through all existing objects and add another element to their refcount arrays? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Adam Olsen wrote: I'm now working on an approach that writes out refcounts in batches to reduce contention. The initial cost is much higher, but it scales better too. I've currently got it to just under 50% cost, meaning two threads is a slight net gain. http://www.research.ibm.com/people/d/dfb/publications.html Look at the various papers on 'Recycler'. The way it works is that for each thread, there is an addref buffer and a decref buffer. The buffers are arrays of pointers. Each time a reference is addref'd, its appended to the addref buffer, likewise for decref. When a buffer gets full, it is added to a queue and then a new buffer is allocated. There is a background thread that actually applies the refcounts from the buffers and frees the objects. Since this background thread is the only thread that ever touches the actual refcount field of the object, there's no need for locking. -- Talin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
What do you think? I'm going to have to agree with Martin here, although I'm not sure I understand what you're saying entirely. Perhaps if you explained where the benefits of this approach come from, it would clear up what you're thinking. After a few days of thought, I'm starting to realize the difficulty of maintaining compatibility with existing C extensions after removing the GIL. The possible C-level side effects are very difficult to work around without kernel or hardware level transaction support. I see a couple of approaches that might work (though I probably haven't thought of everything). 1. Use message passing and transactions. Put every module into its own tasklet that ends up getting owned by one thread or another. Every call to an object that is owned by that module is put into a module wide message queue and delivered sequentially to its objects. All this does is serialize requests to objects implemented in C to slightly mitigate the need to lock. Then use transactions to protect any python object. You still have the problem of C side effects going unnoticed (IE Thread A executes function, object sets c-state in a certain way, Thread B calls the same function, changes all the C-state, A reacts to return value that no longer reflects on the actual state). So, this doesn't actually work, but its close since python objects will remain consistent w/transactions and conflicting C-code won't execute simultaneously. 2. Do it perl style. Perl just spawns off multiple interpreters and doesn't share state between them. This would require cleaning up what state belongs where, and probably providing some global state lock free. For instance, all the numbers, letters, and None are read only, so we could probably work out a way to share them between threads. In fact, any python global could be read only until it is written to. Then it belongs to the thread that wrote to it and is updated in the other threads via some sort of cache-coherency protocol. I haven't really wrapped my head around how C extensions would play with this yet, but essentially code operating in different threads would be operating on different copies of the modules. That seems fair to me. 3. Come up with an elegant way of handling multiple python processes. Of course, this has some downsides. I don't really want to pickle python objects around if I decide they need to be in another address space, which I would probably occasionally need to do if I abstracted away the fact that a bunch of interpreters had been spawned off. 4. Remove the GIL, use transactions for python objects, and adapt all C-extensions to be thread safe. Woo. I'll keep kicking around ideas for a while; hopefully they'll become more refined as I explore the code more. Justin PS. A good paper on how hardware transactional memory could help us out: http://www-faculty.cs.uiuc.edu/~zilles/papers/python_htm.dls2006.pdf A few of you have probably read this already. Martin is even acknowledged, but it was news to me! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Phillip J. Eby wrote: It's not just caches and counters. It's also every built-in type structure, builtin module, builtin function... any Python object that's a built-in, period. That includes things like None, True, and False. Caches would include such things as the pre-created integers -100 through 255, the 1-byte character strings for chr(0)-chr(255), and the interned strings cache, to name a few. Most of these things I've mentioned are truly global, and not specific to an individual interpreter. Pardon my ignorance but why does Python do reference counting for truly global and static objects like None, True, False, small and cached integers, sys and other builtins? If I understand it correctly these objects are never garbaged collected (at least they shouldn't) until the interpreter exits. Wouldn't it decrease the overhead and increase speed when Py_INCREF and Py_DECREF are NOOPs for static and immutable objects? Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
* Christian Heimes wrote: Pardon my ignorance but why does Python do reference counting for truly global and static objects like None, True, False, small and cached integers, sys and other builtins? If I understand it correctly these objects are never garbaged collected (at least they shouldn't) until the interpreter exits. Wouldn't it decrease the overhead and increase speed when Py_INCREF and Py_DECREF are NOOPs for static and immutable objects? The check what kind of object you have takes time, too. Right now, just counting up or down is most likely faster than that check on every refcount operation. nd ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Thu, Sep 13, 2007 at 12:19:21PM +0200, André Malo wrote: Pardon my ignorance but why does Python do reference counting for truly global and static objects like None, True, False, small and cached integers, sys and other builtins? If I understand it correctly these objects are never garbaged collected (at least they shouldn't) until the interpreter exits. Wouldn't it decrease the overhead and increase speed when Py_INCREF and Py_DECREF are NOOPs for static and immutable objects? The check what kind of object you have takes time, too. Right now, just counting up or down is most likely faster than that check on every refcount operation. To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? If they drop to zero (which may happen due to non-atomic adjustments), Python will try to release the static memory, which will crash the malloc implementation. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Thu, Sep 13, 2007 at 01:15:39PM +0200, Martin v. Löwis wrote: To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? If they drop to zero (which may happen due to non-atomic adjustments), Python will try to release the static memory, which will crash the malloc implementation. That could be avoided by a flag on the object which is checked in free(). I'm just suggesting it as an alternative as it sounds like it might be more efficient than either locking or avoiding having reference counts on these objects (especially if the reference count is initialised to MAX_INT/2 or whatever). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Jon To put it another way, would it actually matter if the reference Jon counts for such objects became hopelessly wrong due to non-atomic Jon adjustments? I believe this was suggested and tried by someone (within the last few years). It wasn't any benefit. The costs of special-casing outweighed the costs of uniform reference counting, not to mention the code got more complex. Or something like that. Anyway, it didn't work. Just thinking out loud here, what if ... we use atomic test-and-set to handle reference counting (with a lock for those CPU architectures where we haven't written the necessary assembler fragment), then implement a lock for each mutable type and another for global state (thread state, interpreter state, etc)? Might that be close enough to free threading to provide some benefits, but not so fine-grained that lock contention becomes a bottleneck? Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote: To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? If they drop to zero (which may happen due to non-atomic adjustments), Python will try to release the static memory, which will crash the malloc implementation. More precisely, Python will call the deallocator appropriate for the object type. If that deallocator does nothing, the object continues to live. Such objects could also start out with a refcount of sys.maxint or so to ensure that calls to the no-op deallocator are unlikely. The part I don't understand is how Python would know which objects are global/static. Testing for such a thing sounds like something that would be slower than atomic incref/decref. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Sep 13, 2007, at 10:12 AM, Martin v. Löwis wrote: What do you think? I think what you are describing is the situation of today, except in a less-performant way. The kernel *already* implements such a synchronization server, except that all CPUs can act as such. You write Since we are guaranteeing that synchronized code is running on a single core, it is the equivalent of a lock at the cost of a context switch. This is precisely what a lock costs today: a context switch. Really? Wouldn't we save some memory allocation overhead (since in my design, the lock is a really just simple kernel instruction as opposed to a full blown object) thereby lowering lock overhead (and allowing us to go with finer-grained locks? Since we're using an asynch message queue for the synch-server, it sounds like a standard lock-free algorithm. Since the Python interpreter is synchronized all of the time, it would completely run on the synchronization server all of the time. As you identify, that single CPU might get overloaded, so your scheme would give no benefits (since Python code could never run in parallel), I think I neglected to mention that the locking would still need to be more fine grained - perhaps only do the context switch around refcounts (and the other places where the GIL is critical). If we can do this in a way that allows simple list comprehensions to run in parallel, that would be really helpful (like a truly parallel map function). and only disadvantages (since multiple Python interpreters today can run on multiple CPUs, but could not anymore under your scheme). Well, you could still run python code in parallel if you used multiple processes (each process having its own 'synchronization server'). Is that what you meant? On Sep 13, 2007, at 12:38 PM, Justin Tulloss wrote: What do you think? I'm going to have to agree with Martin here, although I'm not sure I understand what you're saying entirely. Perhaps if you explained where the benefits of this approach come from, it would clear up what you're thinking. Well, my interpretation of the current problem is that removing the GIL has not been productive because of problems with lock contention on multi-core machines. Naturally, we need to make the locking more fine-grained to resolve this. Hopefully we can do so in a way that does not increase the lock overhead (hence my suggestion for a lock free approach using an asynch queue and a core as dedicated server). If we can somehow guarantee all GC operations (which is why the GIL is needed in the first place) run on a single core, we get locking for free without actually having to have threads spinning. regards, Prateek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Since we are guaranteeing that synchronized code is running on a single core, it is the equivalent of a lock at the cost of a context switch. This is precisely what a lock costs today: a context switch. Really? Wouldn't we save some memory allocation overhead (since in my design, the lock is a really just simple kernel instruction as opposed to a full blown object) The GIL is a single variable, not larger than 50 Bytes or so. Locking it requires no memory at all in user-space, and might require 8 bytes or so per waiting thread in kernel-space. thereby lowering lock overhead Why do you think lock overhead is related to memory consumption? Since we're using an asynch message queue for the synch-server, it sounds like a standard lock-free algorithm. You lost me here. What are you trying to achieve? It's not the lock that people complain about, but that Python runs serially most of the time. I think I neglected to mention that the locking would still need to be more fine grained - perhaps only do the context switch around refcounts (and the other places where the GIL is critical). I think this is the point where I need to say good luck implementing it. Well, my interpretation of the current problem is that removing the GIL has not been productive because of problems with lock contention on multi-core machines. My guess is that this interpretation is wrong. It was reported that there was a slowdown by a factor of 2 in a single-threaded application. That can't be due to lock contention. If we can somehow guarantee all GC operations (which is why the GIL is needed in the first place) No, unless we disagree on what a GC operation is. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Sep 13, 2007, at 9:25 PM, Martin v. Löwis wrote: Since we are guaranteeing that synchronized code is running on a single core, it is the equivalent of a lock at the cost of a context switch. This is precisely what a lock costs today: a context switch. Really? Wouldn't we save some memory allocation overhead (since in my design, the lock is a really just simple kernel instruction as opposed to a full blown object) The GIL is a single variable, not larger than 50 Bytes or so. Locking it requires no memory at all in user-space, and might require 8 bytes or so per waiting thread in kernel-space. thereby lowering lock overhead Why do you think lock overhead is related to memory consumption? Well, it can be one (or both) of two things - 1) memory consumption, 2) cost of acquiring and releasing the locks (which you said is the same as a context switch). Since we've also identified (according to GvR's post: http:// www.artima.com/weblogs/viewpost.jsp?thread=214235) that the slowdown was 2x in a single threaded application (which couldn't be due to lock contention), it must be due to lock overhead (unless the programming was otherwise faulty or there is something else about locks that I don't know about - Martin?). Hence I'm assuming that we need to reduce lock overhead. If acquiring and releasing locks (part of lock overhead) is a simple context switch (and I don't doubt you here), then the only remaining thing to optimize is memory operations related to lock objects. Since we're using an asynch message queue for the synch-server, it sounds like a standard lock-free algorithm. You lost me here. What are you trying to achieve? It's not the lock that people complain about, but that Python runs serially most of the time. http://en.wikipedia.org/wiki/Lock-free_and_wait- free_algorithms#The_lock-free_approach Specifically, i'm trying to achieve the approach using a deposit request. I think I neglected to mention that the locking would still need to be more fine grained - perhaps only do the context switch around refcounts (and the other places where the GIL is critical). I think this is the point where I need to say good luck implementing it. I don't mean to be unhelpful. Its just that this discussion started because people (not me - although I would definitely benefit) showed interest in removing the GIL. Well, my interpretation of the current problem is that removing the GIL has not been productive because of problems with lock contention on multi-core machines. My guess is that this interpretation is wrong. It was reported that there was a slowdown by a factor of 2 in a single-threaded application. That can't be due to lock contention. I agree with your point Martin (see my analysis above). Regarding lock contention: I'm guessing that if single threaded applications are so badly affected, then the cumulative overhead on multithreaded applications will be even worse. So we need to reduce the overhead. But then since all Python code runs under the GIL - which is a pretty coarse lock, we have to make the new locking more fine-grained (which is what I think the original patch by Greg Stein did). I'm also guessing that if you do that then for each refcount you're going to have to acquire a lock... which happens *very* frequently (and I think by your earlier responses you concur). So that means anytime multiple threads try to access the same object, they will need to do an incref/decref. e.g. If you access a global variable inside a for- loop from multiple threads. If we can somehow guarantee all GC operations (which is why the GIL is needed in the first place) No, unless we disagree on what a GC operation is. Ok. Other people know more about the specifics of the GIL than I do. However, the main issue with removing the GIL seems to be the reference counting algorithm. That is what I was alluding to. In any case, it is not relevant for the rest of the discussion. regards, Prateek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
http://www.artima.com/weblogs/viewpost.jsp?thread=214235) that the slowdown was 2x in a single threaded application (which couldn't be due to lock contention), it must be due to lock overhead (unless the programming was otherwise faulty or there is something else about locks that I don't know about - Martin?). Hence I'm assuming that we need to reduce lock overhead. If acquiring and releasing locks (part of lock overhead) is a simple context switch (and I don't doubt you here), then the only remaining thing to optimize is memory operations related to lock objects. I think you are putting too many assumptions on top of each other. It might also have been that the locks in the slow implementation were too fine-grained, and that some performance could have been regained by making them coarser again. You lost me here. What are you trying to achieve? It's not the lock that people complain about, but that Python runs serially most of the time. http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms#The_lock-free_approach The asynchronous model assumes that the sender can continue to process data without needing a reply. This is not true for the Python threading model: if the thread needs access to some data structure, it really needs to wait for the result of that access, because that's the semantics of the operations. Specifically, i'm trying to achieve the approach using a deposit request. For that to work, you need to produce a list of requests that can be processed asynchronously. I can't see any in the Python interpreter. I'm also guessing that if you do that then for each refcount you're going to have to acquire a lock... which happens *very* frequently (and I think by your earlier responses you concur). In that it occurs frequently - not in that you have to acquire a lock to modify the refcount. You don't. Ok. Other people know more about the specifics of the GIL than I do. However, the main issue with removing the GIL seems to be the reference counting algorithm. It isn't. Reference counting could be done easily without the GIL. It's rather the container objects, and the global variables, that need protection. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Hrvoje Nikšić [EMAIL PROTECTED] wrote: On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote: To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? If they drop to zero (which may happen due to non-atomic adjustments), Python will try to release the static memory, which will crash the malloc implementation. More precisely, Python will call the deallocator appropriate for the object type. If that deallocator does nothing, the object continues to live. Such objects could also start out with a refcount of sys.maxint or so to ensure that calls to the no-op deallocator are unlikely. The part I don't understand is how Python would know which objects are global/static. Testing for such a thing sounds like something that would be slower than atomic incref/decref. I've explained my experiments here: http://www.artima.com/forums/flat.jsp?forum=106thread=214235start=30msRange=15#279978 Basically though, atomic incref/decref won't work. Once you've got two threads modifying the same location the costs skyrocket. Even without being properly atomic you'll get the same slowdown on x86 (who's cache coherency is fairly strict.) The only two options are: A) Don't modify an object on every incref/decref. Deletion must be delayed. This lets you share (thread-safe) objects. B) Don't share *any* objects. This is a process model (even if they're lightweight like erlang). For the near future, it's much easier to do this using real processes though. Threading is much more powerful, but it remains to be proven that it can be done efficiently. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote: 1. Use message passing and transactions. [...] 2. Do it perl style. [...] 3. Come up with an elegant way of handling multiple python processes. [...] 4. Remove the GIL, use transactions for python objects, [...] The SpiderMonkey JavaScript engine takes a very different approach, described here: http://developer.mozilla.org/en/docs/SpiderMonkey_Internals:_Thread_Safety The SpiderMonkey C API threading model should sound familiar: C code can assume that simple operations, like dictionary lookups, are atomic and thread-safe. C code must explicitly JS_SuspendRequest() before doing blocking I/O or number-crunching (just like Py_BEGIN_ALLOW_THREADS). The main difference is that SpiderMonkey's requests are not mutually exclusive, the way the GIL is. SpiderMonkey does fine-grained locking for mutable objects to avoid race conditions. The clever bit is that SpiderMonkey's per-object locking does *not* require a context switch or even an atomic instruction, in the usual case where an object is *not* shared among threads. (Programs that embed SpiderMonkey therefore run faster if they manage to ensure that threads share relatively few mutable objects. JavaScript doesn't have modules.) Suppose Python went this route. There would still have to be a stop-the-world global lock, because the cycle collector won't work if other threads are going about changing pointers. (SpiderMonkey's GC does the same thing.) Retaining such a lock has another advantage: this change could be completely backward-compatible to extensions. Just use this global lock as the GIL when entering a non-thread-safe extension (all existing extensions would be considered non-thread-safe). This means non-thread-safe extensions would be hoggish (but not much worse than they are already!). Making an existing extension thread-safe would require some thought, but it wouldn't be terribly hard. In the simplest cases, the extension writer could just add a flag to the type saying ok, I'm thread-safe. Refcounting is another major issue. SpiderMonkey uses GC instead. CPython would need to do atomic increfs/decrefs. (Deferred refcounting could mitigate the cost.) The main drawback (aside from the amount of work) is the patent. SpiderMonkey's license grants a worldwide, royalty-free license, but not under the Python license. I think this could be wrangled, if the technical approach looks worthwhile. -j ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote: On 9/13/07, Adam Olsen [EMAIL PROTECTED] wrote: Basically though, atomic incref/decref won't work. Once you've got two threads modifying the same location the costs skyrocket. Even without being properly atomic you'll get the same slowdown on x86 (who's cache coherency is fairly strict.) I'm a bit skeptical of the actual costs of atomic incref. For there to be contention, you would need to have to be modifying the same memory location at the exact same time. That seems unlikely to ever happen. We can't bank on it never happening, but an occasionally expensive operation is ok. After all, it's occasional. That was my initial expectation too. However, the incref *is* a modification. It's not simply an issue of the exact same time, but anything that causes the cache entries to bounce back and forth and delay the rest of the pipeline. If you have a simple loop like for i in range(count): 1.0+n, then the 1.0 literal will get shared between threads, and the refcount will get hammered. Is it reasonable to expect that much sharing? I think it is. Literals are an obvious example, but there's also configuration data passed between threads. Pystone seems to have enough sharing to kill performance. And after all, isn't sharing the whole point (even in the definition) of threads? -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Jason Orendorff [EMAIL PROTECTED] wrote: On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote: 1. Use message passing and transactions. [...] 2. Do it perl style. [...] 3. Come up with an elegant way of handling multiple python processes. [...] 4. Remove the GIL, use transactions for python objects, [...] The SpiderMonkey JavaScript engine takes a very different approach, described here: http://developer.mozilla.org/en/docs/SpiderMonkey_Internals:_Thread_Safety This is basically the same as what perl does, as far as I understand it. There are differences, but they're not that substantial. It's basically the idea of keeping all state separate and treating global access as a special case. I think this is a pretty solid approach, since globals shouldn't be accessed that often. What we would want to do differently is make sure that read-only globals can be cheaply accessed from any thread. Otherwise we lose the performance benefit of having them in the first place. Refcounting is another major issue. SpiderMonkey uses GC instead. CPython would need to do atomic increfs/decrefs. (Deferred refcounting could mitigate the cost.) This is definitely something to think about. I don't really have an answer straight off, but there are several things we could try. The main drawback (aside from the amount of work) is the patent. SpiderMonkey's license grants a worldwide, royalty-free license, but not under the Python license. I think this could be wrangled, if the technical approach looks worthwhile. I'm not sure this is an issue. It's not like we would be using the code, just the patented algorithm. Any code we wrote to implement the algorithm would of course be covered under the python license. I'm not a legal guy though. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Adam Olsen [EMAIL PROTECTED] wrote: Basically though, atomic incref/decref won't work. Once you've got two threads modifying the same location the costs skyrocket. Even without being properly atomic you'll get the same slowdown on x86 (who's cache coherency is fairly strict.) I'm a bit skeptical of the actual costs of atomic incref. For there to be contention, you would need to have to be modifying the same memory location at the exact same time. That seems unlikely to ever happen. We can't bank on it never happening, but an occasionally expensive operation is ok. After all, it's occasional. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Martin v. Löwis wrote: Now we are getting into details: you do NOT have to lock an object to modify its reference count. An atomic increment/decrement operation is enough. I stand corrected. But if it were as simple as that, I think it would have been done by now. I got the impression that this had already been tried, and it was still too slow. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Hrvoje More precisely, Python will call the deallocator appropriate for Hrvoje the object type. If that deallocator does nothing, the object Hrvoje continues to live. Such objects could also start out with a Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op Hrvoje deallocator are unlikely. Maybe sys.maxint/2? You'd hate for the first incref to invoke the deallocator even if it was a no-op. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Thu, Sep 13, 2007 at 06:38:05PM -0500, [EMAIL PROTECTED] wrote: Hrvoje More precisely, Python will call the deallocator appropriate for Hrvoje the object type. If that deallocator does nothing, the object Hrvoje continues to live. Such objects could also start out with a Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op Hrvoje deallocator are unlikely. Maybe sys.maxint/2? You'd hate for the first incref to invoke the deallocator even if it was a no-op. I do believe I already suggested that ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Jon Ribbens wrote: To put it another way, would it actually matter if the reference counts for such objects became hopelessly wrong due to non-atomic adjustments? Again, it would cost time to check whether you could get away with doing non-atomic refcounting. If you're thinking that no check would be needed because only things like True, False and None would be shared between threads, that's quite wrong. If the threads are to communicate at all, they need to share some kind of data somewhere. Also keep in mind that there is one case of wrong refcounting that would be distastrous, which is the case where the refcount becomes zero prematurely. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
[EMAIL PROTECTED] wrote: what if ... we use atomic test-and-set to handle reference counting (with a lock for those CPU architectures where we haven't written the necessary assembler fragment), then implement a lock for each mutable type and another for global state (thread state, interpreter state, etc)? Could be worth a try. A first step might be to just implement the atomic refcounting, and run that single-threaded to see if it has terribly bad effects on performance. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Prateek Sureka wrote: Naturally, we need to make the locking more fine-grained to resolve this. Hopefully we can do so in a way that does not increase the lock overhead (hence my suggestion for a lock free approach using an asynch queue and a core as dedicated server). What you don't seem to see is that this would have no less overhead, and probably a lot *more*, than a mutex or other standard synchronisation mechanism. Certainly a lot more than an atomic instruction for the incref/decref. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Jason Orendorff wrote: The clever bit is that SpiderMonkey's per-object locking does *not* require a context switch or even an atomic instruction, in the usual case where an object is *not* shared among threads. How does it tell whether an object is shared between threads? That sounds like the really clever bit to me. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Pardon me for talking with no experience in such matters, but... Okay, incrementing a reference counter is atomic, therefore the cheapest possible operation. Is it possible to keep reference counting atomic in a multi-thread model? Could you do the following... let's consider two threads, A and B. Each time an object is created, a reference count is created in both A and B. Let's suppose A has a real reference and B has no reference really. Couldn't the GC check two reference registers for a reference count? The object would then be cleaned up only if both registers were 0. To exploit multiple CPUs, you could have two persistent Python processes on each CPU with its own mini-GIL. Object creation would then involve a call to each process to create the reference and GC would involve checking each process to see what their count is. However, it would mean that within each process, threads could create additional references or remove references in an atomic way. In a single-CPU system, this would be the same cost as currently, since I think that situation would devolve to having just one place to check for references. This seems to mean that it is the case that it would be no more expensive for a single-CPU system. In a two-CPU system, I'm no expertise on the actual call overheads of object creation and garbage collection, but logically it would double the effort of object creation and destruction (all such operations now need to occur on both processes) but would keep reference increments and decrements atomic. Once again, I'm really sorry if I'm completely off-base since I have never done any actual coding in this area, but I thought I'd make the suggestion just in case it happened to have relevance. Thanks, -Tennessee ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/13/07, Greg Ewing [EMAIL PROTECTED] wrote: Jason Orendorff wrote: The clever bit is that SpiderMonkey's per-object locking does *not* require a context switch or even an atomic instruction, in the usual case where an object is *not* shared among threads. How does it tell whether an object is shared between threads? That sounds like the really clever bit to me. If you look at the article, they have a code sample. Basically a global is owned by the first thread that touches it. That thread can do whatever it wants with that global. If another thread wants to touch the global, it locks everything to do so. This is a pretty good idea except that in Python there are so many globals that all threads benefit from having access to. Luckily, except for their reference counts, they're mostly read-only. Therefore, if we can work out this reference count, we can probably use a similar concept. Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
But this has been raised before, and was rejected as not worth the amount of work that would be required to achieve it. In my understanding, there is an important difference between it was rejected, and it was not done. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/12/07, Martin v. Löwis [EMAIL PROTECTED] wrote: Now we are getting into details: you do NOT have to lock an object to modify its reference count. An atomic increment/decrement operation is enough. One could measure the performance hit incurred by using atomic operations for refcounting by hacking a few macros -- right? Deferred reference counting (DRC for short) might help... http://www.memorymanagement.org/glossary/d.html#deferred.reference.counting I can explain a little more how this works if anyone's interested. DRC basically eliminates reference counting for locals--that is, pointers from the stack to an object. An object becomes refcounted only when some other object gets a pointer to it. The drawback is that destructors aren't called quite as promptly as in true refcounting. (They're still called in the right order, though--barring cycles, an object's destructor is called before its children's destructors.) What counts as stack is up to the implementation; typically it means the C stack. This could be used to eliminate most refcounting in C code, although listobject.c would keep it. The amount of per-platform assembly code needed is surprisingly small (and won't change, once you've written it--the Tamarin JavaScript VM does this). You could go further and treat the Python f_locals and interpreter stack as stack. I think this would eliminate all refcounting in the interpreter. Of course, it complicates matters that f_locals is actually an object visible from Python. Just a thought, not a demand, please don't flame me, -j ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Brett We should probably document where all of these globals lists are Brett instead of relying on looking for all file level static Brett declarations or something. I smell a wiki page. Skip Brett Or would there be benefit to moving things like this to the Brett interpreter struct so that threads within a single interpreter Brett call are locked but interpreters can act much more independently? Would that simplify things all that much? All containers would probably still rely on the GIL. Also, all objects rely on it to do reference count increment/decrement as I recall. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Martin Now we are getting into details: you do NOT have to lock an Martin object to modify its reference count. An atomic Martin increment/decrement operation is enough. Implemented in asm I suspect? For common CPUs this could just be part of the normal Python distribution. For uncommon ones this could use a lock until someone gets around to writing the necessary couple lines of assembler. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
I was reading GvR's post on this and came up with a theory on how to tackle the problem. I ended up putting it in a blog post. http://www.brainwavelive.com/blog/index.php?/archives/12-Suggestion- for-removing-the-Python-Global-Interpreter-Lock.html What do you think? Prateek On Sep 12, 2007, at 9:08 PM, [EMAIL PROTECTED] wrote: Martin Now we are getting into details: you do NOT have to lock an Martin object to modify its reference count. An atomic Martin increment/decrement operation is enough. Implemented in asm I suspect? For common CPUs this could just be part of the normal Python distribution. For uncommon ones this could use a lock until someone gets around to writing the necessary couple lines of assembler. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ surekap%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
What do you think? I think what you are describing is the situation of today, except in a less-performant way. The kernel *already* implements such a synchronization server, except that all CPUs can act as such. You write Since we are guaranteeing that synchronized code is running on a single core, it is the equivalent of a lock at the cost of a context switch. This is precisely what a lock costs today: a context switch. Since the Python interpreter is synchronized all of the time, it would completely run on the synchronization server all of the time. As you identify, that single CPU might get overloaded, so your scheme would give no benefits (since Python code could never run in parallel), and only disadvantages (since multiple Python interpreters today can run on multiple CPUs, but could not anymore under your scheme). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Removing the GIL (Me, not you!)
Hi, I had a whole long email about exactly what I was doing, but I think I'll get to the point instead. I'm trying to implement a python concurrency API and would like to use cpython to do it. To do that, I would like to remove the GIL. So, since I'm new to interpreter hacking, some help would be appreciated. I've listed what I think the GIL does; if you guys could add to this list or refine it, I would very much appreciate it. Roles of the GIL: 1. Some global interpreter state/modules are protected (where are these globals at?) 2. When writing C extensions I can change the state of my python object without worrying about synchronization 3. When writing C extensions I can change my own internal C state without worrying about synchronization (unless I have other, non-python threads running) Does anyone know of a place where the GIL is required when not operating on a python object? It seems like this would never happen, and would make replacing the GIL somewhat easier. I've only started looking at the code recently, so please forgive my naivety. I'm still learning how the interpreter works on a high level, let alone all the nitty gritty details! Thanks, Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
1. Some global interpreter state/modules are protected (where are these globals at?) It's the interpreter and thread state itself (pystate.h), for the thread state, also _PyThreadState_Current. Then there is the GC state, in particular generations. There are various caches and counters also. 2. When writing C extensions I can change the state of my python object without worrying about synchronization 3. When writing C extensions I can change my own internal C state without worrying about synchronization (unless I have other, non-python threads running) 4. The builtin container types are protected by the GIL, and various other builtin objects 5. Reference counting is protected by the GIL 6. PyMalloc is protected by the GIL. Does anyone know of a place where the GIL is required when not operating on a python object? See 6 above, also (obviously) 1. I've only started looking at the code recently, so please forgive my naivety. I'm still learning how the interpreter works on a high level, let alone all the nitty gritty details! Good luck! Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote: 1. Some global interpreter state/modules are protected (where are these globals at?) It's the interpreter and thread state itself (pystate.h), for the thread state, also _PyThreadState_Current. Then there is the GC state, in particular generations. There are various caches and counters also. Caches seem like they definitely might be a problem. Would you mind expanding on this a little? What gets cached and why? Justin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Justin Caches seem like they definitely might be a problem. Would you Justin mind expanding on this a little? What gets cached and why? I believe the integer free list falls into this category. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
It's the interpreter and thread state itself (pystate.h), for the thread state, also _PyThreadState_Current. Then there is the GC state, in particular generations. There are various caches and counters also. Caches seem like they definitely might be a problem. Would you mind expanding on this a little? What gets cached and why? Depends on the Python version what precisely gets cached. Several types preserve a pool of preallocated objects, to speed up allocation. Examples are intobject.c (block_list, free_list), frameobject.c (free_list), listobject.c (free_list), methodobject.c (free_list), float_object.c (block_list, free_list), classobject.c (free_list). Plus there are tons of variables caching string objects. From classobject.c alone: getattrstr, setattrstr, delattrs, docstr, modstr, namestr, initstr, delstr, reprstr, strstr, hashstr, eqstr, cmpstr, getitemstr, setitemstr, delitemstr, lenstr, iterstr, nextstr, getslicestr, setslicestr, delslicestr, __contains__, all arguments to UNARY, UNARY_FB, BINARY, BINARY_INPLACE (e.g. instance_neg, instance_or, instance_ior, then cmp_obj, nonzerostr, indexstr. (admittedly, classobject.c is extreme here). There are probably more classes which I just forgot. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
At 10:07 AM 9/11/2007 -0500, Justin Tulloss wrote: On 9/11/07, Martin v. Löwis mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: 1. Some global interpreter state/modules are protected (where are these globals at?) It's the interpreter and thread state itself (pystate.h), for the thread state, also _PyThreadState_Current. Then there is the GC state, in particular generations. There are various caches and counters also. Caches seem like they definitely might be a problem. Would you mind expanding on this a little? What gets cached and why? It's not just caches and counters. It's also every built-in type structure, builtin module, builtin function... any Python object that's a built-in, period. That includes things like None, True, and False. Caches would include such things as the pre-created integers -100 through 255, the 1-byte character strings for chr(0)-chr(255), and the interned strings cache, to name a few. Most of these things I've mentioned are truly global, and not specific to an individual interpreter. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On 9/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote: It's the interpreter and thread state itself (pystate.h), for the thread state, also _PyThreadState_Current. Then there is the GC state, in particular generations. There are various caches and counters also. Caches seem like they definitely might be a problem. Would you mind expanding on this a little? What gets cached and why? Depends on the Python version what precisely gets cached. Several types preserve a pool of preallocated objects, to speed up allocation. Examples are intobject.c (block_list, free_list), frameobject.c (free_list), listobject.c (free_list), methodobject.c (free_list), float_object.c (block_list, free_list), classobject.c (free_list). Plus there are tons of variables caching string objects. From classobject.c alone: getattrstr, setattrstr, delattrs, docstr, modstr, namestr, initstr, delstr, reprstr, strstr, hashstr, eqstr, cmpstr, getitemstr, setitemstr, delitemstr, lenstr, iterstr, nextstr, getslicestr, setslicestr, delslicestr, __contains__, all arguments to UNARY, UNARY_FB, BINARY, BINARY_INPLACE (e.g. instance_neg, instance_or, instance_ior, then cmp_obj, nonzerostr, indexstr. (admittedly, classobject.c is extreme here). There are probably more classes which I just forgot. We should probably document where all of these globals lists are instead of relying on looking for all file level static declarations or something. Or would there be benefit to moving things like this to the interpreter struct so that threads within a single interpreter call are locked but interpreters can act much more independently? -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
It's not just caches and counters. It's also every built-in type structure, builtin module, builtin function... any Python object that's a built-in, period. That includes things like None, True, and False. Sure - but those things don't get modified that often, except for their reference count. In addition, they are objects, and Justin seems to believe that things are easier if they are objects. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
On Sep 11, 2007, at 3:30 PM, Brett Cannon wrote: We should probably document where all of these globals lists are instead of relying on looking for all file level static declarations or something. Or would there be benefit to moving things like this to the interpreter struct so that threads within a single interpreter call are locked but interpreters can act much more independently? This would be nice. It would be really nice if python was embeddable more like TCL: separate interpreters really are separate, and don't share state. That means basically everything has to be stored in an interp-specific data structure, not in static variables. But this has been raised before, and was rejected as not worth the amount of work that would be required to achieve it. (it's certainly not worth it enough for *me* to do the work, so I can't blame anyone else for making the same determination.) James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
We should probably document where all of these globals lists are instead of relying on looking for all file level static declarations or something. I'm not sure what would be gained here, except for people occasionally (i.e. every three years) asking how they can best get rid of the GIL. Or would there be benefit to moving things like this to the interpreter struct so that threads within a single interpreter call are locked but interpreters can act much more independently? The multiple interpreter feature doesn't quite work, and likely won't for a foreseeable future; specifically, objects can easily leak across interpreters. That's actually not a problem for immutable objects, like the strings, but here come the global objects into play which PJE mentions: types, including exceptions. Making them per-interpreter would probably break quite some code. As for the cached strings - it would be easy to make a global table of these, e.g. calling them _PyS__init__, _PyS__add__, and so on. These could be initialized at startup, simplifying the code that uses them because they don't have to worry about failures. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Phillip J. Eby wrote: It's also every built-in type structure, builtin module, builtin function... any Python object that's a built-in, period. Where built-in in this context means anything implemented in C (i.e. it includes extension modules). -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing the GIL (Me, not you!)
Martin v. Löwis wrote: Sure - but those things don't get modified that often, except for their reference count. The reference count is the killer, though -- you have to lock the object even to do that. And it happens a LOT, to all objects, including immutable ones. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com