Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Adam Olsen
On 9/13/07, Greg Ewing [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
  what if ... we use atomic test-and-set to
  handle reference counting (with a lock for those CPU architectures where we
  haven't written the necessary assembler fragment), then implement a lock for
  each mutable type and another for global state (thread state, interpreter
  state, etc)?

 Could be worth a try. A first step might be to just implement
 the atomic refcounting, and run that single-threaded to see
 if it has terribly bad effects on performance.

I've done this experiment.  It was about 12% on my box.  Later, once I
had everything else setup so I could run two threads simultaneously, I
found much worse costs.  All those literals become shared objects that
create contention.

I'm now working on an approach that writes out refcounts in batches to
reduce contention.  The initial cost is much higher, but it scales
better too.  I've currently got it to just under 50% cost, meaning two
threads is a slight net gain.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Steve Holden
Hrvoje Nikšić wrote:
 On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote:
 To put it another way, would it actually matter if the reference
 counts for such objects became hopelessly wrong due to non-atomic
 adjustments?
 If they drop to zero (which may happen due to non-atomic adjustments),
 Python will try to release the static memory, which will crash the
 malloc implementation.
 
 More precisely, Python will call the deallocator appropriate for the
 object type.  If that deallocator does nothing, the object continues to
 live.  Such objects could also start out with a refcount of sys.maxint
 or so to ensure that calls to the no-op deallocator are unlikely.
 
The thought of adding references is amusing. What happens when a 
refcount becomes negative by overflow? I know, I should read the source ...

regards
  Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd   http://www.holdenweb.com
Skype: holdenweb  http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Justin Tulloss
On 9/14/07, Adam Olsen [EMAIL PROTECTED] wrote:

  Could be worth a try. A first step might be to just implement
  the atomic refcounting, and run that single-threaded to see
  if it has terribly bad effects on performance.

 I've done this experiment.  It was about 12% on my box.  Later, once I
 had everything else setup so I could run two threads simultaneously, I
 found much worse costs.  All those literals become shared objects that
 create contention.


It's hard to argue with cold hard facts when all we have is raw speculation.
What do you think of a model where there is a global thread count that
keeps track of how many threads reference an object? Then there are
thread-specific reference counters for each object. When a thread's refcount
goes to 0, it decrefs the object's thread count. If you did this right,
hopefully there would only be cache updates when you update the thread
count, which will only be when a thread first references an object and when
it last references an object.

I mentioned this idea earlier and it's growing on me. Since you've actually
messed around with the code, do you think this would alleviate some of the
contention issues?

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Hrvoje Nikšić
On Thu, 2007-09-13 at 18:38 -0500, [EMAIL PROTECTED] wrote:
 Hrvoje More precisely, Python will call the deallocator appropriate for
 Hrvoje the object type.  If that deallocator does nothing, the object
 Hrvoje continues to live.  Such objects could also start out with a
 Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op
 Hrvoje deallocator are unlikely.
 
 Maybe sys.maxint/2?  You'd hate for the first incref to invoke the
 deallocator even if it was a no-op.  

ob_refcnt is signed.  :-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Adam Olsen
On 9/14/07, Justin Tulloss [EMAIL PROTECTED] wrote:

 On 9/14/07, Adam Olsen [EMAIL PROTECTED] wrote:
   Could be worth a try. A first step might be to just implement
   the atomic refcounting, and run that single-threaded to see
   if it has terribly bad effects on performance.
 
  I've done this experiment.  It was about 12% on my box.  Later, once I
  had everything else setup so I could run two threads simultaneously, I
  found much worse costs.  All those literals become shared objects that
  create contention.

 It's hard to argue with cold hard facts when all we have is raw speculation.
 What do you think of a model where there is a global thread count that
 keeps track of how many threads reference an object? Then there are
 thread-specific reference counters for each object. When a thread's refcount
 goes to 0, it decrefs the object's thread count. If you did this right,
 hopefully there would only be cache updates when you update the thread
 count, which will only be when a thread first references an object and when
 it last references an object.

 I mentioned this idea earlier and it's growing on me. Since you've actually
 messed around with the code, do you think this would alleviate some of the
 contention issues?

There would be some poor worst-case behaviour.  In the case of
literals you'd start referencing them when you call a function, then
stop when the function returns.  Same for any shared datastructure.

I think caching/buffering refcounts in general holds promise though.
My current approach uses a crude hash table as a cache and only
flushes when there's a collision or when the tracing GC starts up.  So
far I've only got about 50% of the normal performance, but that's with
90% or more scalability, and I'm hoping to keep improving it.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Tony Nelson
At 1:51 AM -0500 9/14/07, Justin Tulloss wrote:
On 9/14/07, Adam Olsen mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:

 Could be worth a try. A first step might be to just implement
 the atomic refcounting, and run that single-threaded to see
 if it has terribly bad effects on performance.

I've done this experiment.  It was about 12% on my box.  Later, once I
had everything else setup so I could run two threads simultaneously, I
found much worse costs.  All those literals become shared objects that
create contention.


It's hard to argue with cold hard facts when all we have is raw
speculation. What do you think of a model where there is a global thread
count that keeps track of how many threads reference an object? Then
there are thread-specific reference counters for each object. When a
thread's refcount goes to 0, it decrefs the object's thread count. If you
did this right, hopefully there would only be cache updates when you
update the thread count, which will only be when a thread first references
an object and when it last references an object.

It's likely that cache line contention is the issue, so don't glom all the
different threads' refcount for an object into one vector.  Keep each
thread's refcounts in a per-thread vector of objects, so only that thread
will cache that vector, or make refcounts so large that each will be in its
own cache line (usu. 64 bytes, not too horrible for testing purposes).  I
don't know all what would be required for separate vectors of refcounts,
but each object could contain its index into the vectors, which would all
be the same size (Go Virtual Memory!).


I mentioned this idea earlier and it's growing on me. Since you've
actually messed around with the code, do you think this would alleviate
some of the contention issues?

Justin

Your idea can be combined with the maxint/2 initial refcount for
non-disposable objects, which should about eliminate thread-count updates
for them.
-- 

TonyN.:'   mailto:[EMAIL PROTECTED]
  '  http://www.georgeanelson.com/___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Justin Tulloss
Your idea can be combined with the maxint/2 initial refcount for
 non-disposable objects, which should about eliminate thread-count updates
 for them.
 --


 I don't really like the maxint/2 idea because it requires us to
differentiate between globals and everything else. Plus, it's a hack. I'd
like a more elegant solution if possible.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Jean-Paul Calderone
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote:
Your idea can be combined with the maxint/2 initial refcount for
 non-disposable objects, which should about eliminate thread-count updates
 for them.
 --


 I don't really like the maxint/2 idea because it requires us to
differentiate between globals and everything else. Plus, it's a hack. I'd
like a more elegant solution if possible.

It's not really a solution either.  If your program runs for a couple
minutes and then exits, maybe it won't trigger some catastrophic behavior
from this hack, but if you have a long running process then you're almost
certain to be screwed over by this (it wouldn't even have to be *very*
long running - a month or two could do it on a 32bit platform).

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Steve Holden
Jean-Paul Calderone wrote:
 On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss [EMAIL PROTECTED] wrote:
 Your idea can be combined with the maxint/2 initial refcount for
 non-disposable objects, which should about eliminate thread-count updates
 for them.
 --

 I don't really like the maxint/2 idea because it requires us to
 differentiate between globals and everything else. Plus, it's a hack. I'd
 like a more elegant solution if possible.
 
 It's not really a solution either.  If your program runs for a couple
 minutes and then exits, maybe it won't trigger some catastrophic behavior
 from this hack, but if you have a long running process then you're almost
 certain to be screwed over by this (it wouldn't even have to be *very*
 long running - a month or two could do it on a 32bit platform).
 
Could each class define the value to be added to or subtracted from the 
refcount? We'd only need a bit to store the value (since it would always 
be zero or one), but the execution time might increase quite a lot if 
there's no nifty way to conditionally add or subtract one.

regards
  Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd   http://www.holdenweb.com
Skype: holdenweb  http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Jean-Paul Calderone
On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight [EMAIL PROTECTED] wrote:

On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote:
On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss  [EMAIL PROTECTED] 
wrote:
Your idea can be combined with the maxint/2 initial refcount for
non-disposable objects, which should about eliminate thread-count 
updates
for them.
--

I don't really like the maxint/2 idea because it requires us to
differentiate between globals and everything else. Plus, it's a  hack. I'd
like a more elegant solution if possible.

It's not really a solution either.  If your program runs for a couple
minutes and then exits, maybe it won't trigger some catastrophic  behavior
from this hack, but if you have a long running process then you're  almost
certain to be screwed over by this (it wouldn't even have to be *very*
long running - a month or two could do it on a 32bit platform).

Not true: the refcount becoming 0 only calls a dealloc function.. For 
objects which are not deletable, the dealloc function should simply  set the 
refcount back to maxint/2. Done.


So, eg, replace the Py_FatalError in none_dealloc with an assignment to
ob_refcnt?  Good point, sounds like it could work (I'm pretty sure you
know more about deallocation in CPython than I :).

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Adam Olsen
On 9/14/07, Jean-Paul Calderone [EMAIL PROTECTED] wrote:
 On Fri, 14 Sep 2007 17:43:39 -0400, James Y Knight [EMAIL PROTECTED] wrote:
 
 On Sep 14, 2007, at 3:30 PM, Jean-Paul Calderone wrote:
 On Fri, 14 Sep 2007 14:13:47 -0500, Justin Tulloss  [EMAIL PROTECTED]
 wrote:
 Your idea can be combined with the maxint/2 initial refcount for
 non-disposable objects, which should about eliminate thread-count
 updates
 for them.
 --
 
 I don't really like the maxint/2 idea because it requires us to
 differentiate between globals and everything else. Plus, it's a  hack. I'd
 like a more elegant solution if possible.
 
 It's not really a solution either.  If your program runs for a couple
 minutes and then exits, maybe it won't trigger some catastrophic  behavior
 from this hack, but if you have a long running process then you're  almost
 certain to be screwed over by this (it wouldn't even have to be *very*
 long running - a month or two could do it on a 32bit platform).
 
 Not true: the refcount becoming 0 only calls a dealloc function.. For
 objects which are not deletable, the dealloc function should simply  set the
 refcount back to maxint/2. Done.
 

 So, eg, replace the Py_FatalError in none_dealloc with an assignment to
 ob_refcnt?  Good point, sounds like it could work (I'm pretty sure you
 know more about deallocation in CPython than I :).

As I've said, this is all moot.  The cache coherence protocols on x86
means this will be nearly as slow as proper atomic refcounting, and
will not scale if multiple threads regularly touch the object.  My
experience is that they will touch it regularly.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Greg Ewing
Justin Tulloss wrote:
 
 What do you think of a model where there is a global 
 thread count that keeps track of how many threads reference an object? 

I've thought about that sort of thing before. The problem
is how you keep track of how many threads reference an
object, without introducing far more overhead than
you're trying to eliminate.

 Then there are thread-specific reference counters for each object.

What happens when a new thread comes into existence? Do
you go through all existing objects and add another element
to their refcount arrays?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-14 Thread Talin
Adam Olsen wrote:
 I'm now working on an approach that writes out refcounts in batches to
 reduce contention.  The initial cost is much higher, but it scales
 better too.  I've currently got it to just under 50% cost, meaning two
 threads is a slight net gain.

http://www.research.ibm.com/people/d/dfb/publications.html

Look at the various papers on 'Recycler'.

The way it works is that for each thread, there is an addref buffer and 
a decref buffer. The buffers are arrays of pointers. Each time a 
reference is addref'd, its appended to the addref buffer, likewise for 
decref. When a buffer gets full, it is added to a queue and then a new 
buffer is allocated.

There is a background thread that actually applies the refcounts from 
the buffers and frees the objects. Since this background thread is the 
only thread that ever touches the actual refcount field of the object, 
there's no need for locking.

-- Talin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
 What do you think?


I'm going to have to agree with Martin here, although I'm not sure I
understand what you're saying entirely. Perhaps if you explained where the
benefits of this approach come from, it would clear up what you're thinking.


After a few days of thought, I'm starting to realize the difficulty of
maintaining compatibility with existing C extensions after removing the GIL.
The possible C-level side effects are very difficult to work around without
kernel or hardware level transaction support. I see a couple of approaches
that might work (though I probably haven't thought of everything).

1. Use message passing and transactions.
Put every module into its own tasklet that ends up getting owned by one
thread or another. Every call to an object that is owned by that module is
put into a module wide message queue and delivered sequentially to its
objects. All this does is serialize requests to objects implemented in C to
slightly mitigate the need to lock. Then use transactions to protect any
python object. You still have the problem of C side effects going unnoticed
(IE Thread A executes function, object sets c-state in a certain way, Thread
B calls the same function, changes all the C-state, A reacts to return value
that no longer reflects on the actual state). So, this doesn't actually
work, but its close since python objects will remain consistent
w/transactions and conflicting C-code won't execute simultaneously.

2. Do it perl style.
Perl just spawns off multiple interpreters and doesn't share state between
them. This would require cleaning up what state belongs where, and probably
providing some global state lock free. For instance, all the numbers,
letters, and None are read only, so we could probably work out a way to
share them between threads. In fact, any python global could be read only
until it is written to. Then it belongs to the thread that wrote to it and
is updated in the other threads via some sort of cache-coherency protocol. I
haven't really wrapped my head around how C extensions would play with this
yet, but essentially code operating in different threads would be operating
on different copies of the modules. That seems fair to me.

3. Come up with an elegant way of handling multiple python processes. Of
course, this has some downsides. I don't really want to pickle python
objects around if I decide they need to be in another address space, which I
would probably occasionally need to do if I abstracted away the fact that a
bunch of interpreters had been spawned off.

4. Remove the GIL, use transactions for python objects, and adapt all
C-extensions to be thread safe. Woo.

I'll keep kicking around ideas for a while; hopefully they'll become more
refined as I explore the code more.

Justin

PS. A good paper on how hardware transactional memory could help us out:
http://www-faculty.cs.uiuc.edu/~zilles/papers/python_htm.dls2006.pdf
A few of you have probably read this already. Martin is even acknowledged,
but it was news to me!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Christian Heimes
Phillip J. Eby wrote:
 It's not just caches and counters.  It's also every built-in type 
 structure, builtin module, builtin function...  any Python object 
 that's a built-in, period.  That includes things like None, True, and False.
 
 Caches would include such things as the pre-created integers -100 
 through 255, the 1-byte character strings for chr(0)-chr(255), and 
 the interned strings cache, to name a few.
 
 Most of these things I've mentioned are truly global, and not 
 specific to an individual interpreter.

Pardon my ignorance but why does Python do reference counting for truly
global and static objects like None, True, False, small and cached
integers, sys and other builtins? If I understand it correctly these
objects are never garbaged collected (at least they shouldn't) until the
interpreter exits. Wouldn't it decrease the overhead and increase speed
when Py_INCREF and Py_DECREF are NOOPs for static and immutable objects?

Christian

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread André Malo
* Christian Heimes wrote: 

 Pardon my ignorance but why does Python do reference counting for truly
 global and static objects like None, True, False, small and cached
 integers, sys and other builtins? If I understand it correctly these
 objects are never garbaged collected (at least they shouldn't) until the
 interpreter exits. Wouldn't it decrease the overhead and increase speed
 when Py_INCREF and Py_DECREF are NOOPs for static and immutable objects?

The check what kind of object you have takes time, too. Right now, just 
counting up or down is most likely faster than that check on every refcount 
operation.

nd
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Jon Ribbens
On Thu, Sep 13, 2007 at 12:19:21PM +0200, André Malo wrote:
  Pardon my ignorance but why does Python do reference counting for truly
  global and static objects like None, True, False, small and cached
  integers, sys and other builtins? If I understand it correctly these
  objects are never garbaged collected (at least they shouldn't) until the
  interpreter exits. Wouldn't it decrease the overhead and increase speed
  when Py_INCREF and Py_DECREF are NOOPs for static and immutable objects?
 
 The check what kind of object you have takes time, too. Right now, just 
 counting up or down is most likely faster than that check on every refcount 
 operation.

To put it another way, would it actually matter if the reference
counts for such objects became hopelessly wrong due to non-atomic
adjustments?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Martin v. Löwis
 To put it another way, would it actually matter if the reference
 counts for such objects became hopelessly wrong due to non-atomic
 adjustments?

If they drop to zero (which may happen due to non-atomic adjustments),
Python will try to release the static memory, which will crash the
malloc implementation.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Jon Ribbens
On Thu, Sep 13, 2007 at 01:15:39PM +0200, Martin v. Löwis wrote:
  To put it another way, would it actually matter if the reference
  counts for such objects became hopelessly wrong due to non-atomic
  adjustments?
 
 If they drop to zero (which may happen due to non-atomic adjustments),
 Python will try to release the static memory, which will crash the
 malloc implementation.

That could be avoided by a flag on the object which is checked in
free(). I'm just suggesting it as an alternative as it sounds like
it might be more efficient than either locking or avoiding having
reference counts on these objects (especially if the reference count
is initialised to MAX_INT/2 or whatever).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread skip

Jon To put it another way, would it actually matter if the reference
Jon counts for such objects became hopelessly wrong due to non-atomic
Jon adjustments?

I believe this was suggested and tried by someone (within the last few
years).  It wasn't any benefit.  The costs of special-casing outweighed the
costs of uniform reference counting, not to mention the code got more
complex.  Or something like that.  Anyway, it didn't work.

Just thinking out loud here, what if ... we use atomic test-and-set to
handle reference counting (with a lock for those CPU architectures where we
haven't written the necessary assembler fragment), then implement a lock for
each mutable type and another for global state (thread state, interpreter
state, etc)?  Might that be close enough to free threading to provide some
benefits, but not so fine-grained that lock contention becomes a bottleneck?

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Hrvoje Nikšić
On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote:
  To put it another way, would it actually matter if the reference
  counts for such objects became hopelessly wrong due to non-atomic
  adjustments?
 
 If they drop to zero (which may happen due to non-atomic adjustments),
 Python will try to release the static memory, which will crash the
 malloc implementation.

More precisely, Python will call the deallocator appropriate for the
object type.  If that deallocator does nothing, the object continues to
live.  Such objects could also start out with a refcount of sys.maxint
or so to ensure that calls to the no-op deallocator are unlikely.

The part I don't understand is how Python would know which objects are
global/static.  Testing for such a thing sounds like something that
would be slower than atomic incref/decref.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Prateek Sureka

On Sep 13, 2007, at 10:12 AM, Martin v. Löwis wrote:

 What do you think?

 I think what you are describing is the situation of today,
 except in a less-performant way. The kernel *already*
 implements such a synchronization server, except that
 all CPUs can act as such. You write

 Since we are guaranteeing that synchronized code is running on a  
 single
 core, it is the equivalent of a lock at the cost of a context switch.

 This is precisely what a lock costs today: a context switch.


Really? Wouldn't we save some memory allocation overhead (since in my  
design, the lock is a really just simple kernel instruction as  
opposed to a full blown object) thereby lowering lock overhead (and  
allowing us to go with finer-grained locks?

Since we're using an asynch message queue for the synch-server, it  
sounds like a standard lock-free algorithm.


 Since the Python interpreter is synchronized all of the time, it
 would completely run on the synchronization server all of the
 time. As you identify, that single CPU might get overloaded, so
 your scheme would give no benefits (since Python code could never
 run in parallel),

I think I neglected to mention that the locking would still need to  
be more fine grained - perhaps only do the context switch around  
refcounts (and the other places where the GIL is critical).
If we can do this in a way that allows simple list comprehensions to  
run in parallel, that would be really helpful (like a truly parallel  
map function).


 and only disadvantages (since multiple Python
 interpreters today can run on multiple CPUs, but could not
 anymore under your scheme).

Well, you could still run python code in parallel if you used  
multiple processes (each process having its own 'synchronization  
server'). Is that what you meant?


On Sep 13, 2007, at 12:38 PM, Justin Tulloss wrote:

 What do you think?

 I'm going to have to agree with Martin here, although I'm not sure  
 I understand what you're saying entirely. Perhaps if you explained  
 where the benefits of this approach come from, it would clear up  
 what you're thinking.

Well, my interpretation of the current problem is that removing the  
GIL has not been productive because of problems with lock contention  
on multi-core machines. Naturally, we need to make the locking more  
fine-grained to resolve this. Hopefully we can do so in a way that  
does not increase the lock overhead (hence my suggestion for a lock  
free approach using an asynch queue and a core as dedicated server).

If we can somehow guarantee all GC operations (which is why the GIL  
is needed in the first place) run on a single core, we get locking  
for free without actually having to have threads spinning.

regards,
Prateek
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Martin v. Löwis
 Since we are guaranteeing that synchronized code is running on a single
 core, it is the equivalent of a lock at the cost of a context switch.

 This is precisely what a lock costs today: a context switch.

 
 Really? Wouldn't we save some memory allocation overhead (since in my
 design, the lock is a really just simple kernel instruction as opposed
 to a full blown object)

The GIL is a single variable, not larger than 50 Bytes or so. Locking
it requires no memory at all in user-space, and might require 8 bytes
or so per waiting thread in kernel-space.

 thereby lowering lock overhead

Why do you think lock overhead is related to memory consumption?

 Since we're using an asynch message queue for the synch-server, it
 sounds like a standard lock-free algorithm.

You lost me here. What are you trying to achieve? It's not the lock
that people complain about, but that Python runs serially most
of the time.

 I think I neglected to mention that the locking would still need to be
 more fine grained - perhaps only do the context switch around refcounts
 (and the other places where the GIL is critical).

I think this is the point where I need to say good luck implementing
it.

 Well, my interpretation of the current problem is that removing the GIL
 has not been productive because of problems with lock contention on
 multi-core machines.

My guess is that this interpretation is wrong. It was reported that
there was a slowdown by a factor of 2 in a single-threaded application.
That can't be due to lock contention.

 If we can somehow guarantee all GC operations (which is why the GIL is
 needed in the first place)

No, unless we disagree on what a GC operation is.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Prateek Sureka

On Sep 13, 2007, at 9:25 PM, Martin v. Löwis wrote:

 Since we are guaranteeing that synchronized code is running on a  
 single
 core, it is the equivalent of a lock at the cost of a context  
 switch.

 This is precisely what a lock costs today: a context switch.


 Really? Wouldn't we save some memory allocation overhead (since in my
 design, the lock is a really just simple kernel instruction as  
 opposed
 to a full blown object)

 The GIL is a single variable, not larger than 50 Bytes or so. Locking
 it requires no memory at all in user-space, and might require 8 bytes
 or so per waiting thread in kernel-space.

 thereby lowering lock overhead

 Why do you think lock overhead is related to memory consumption?

Well, it can be one (or both) of two things - 1) memory consumption,  
2) cost of acquiring and releasing the locks (which you said is the  
same as a context switch).

Since we've also identified (according to GvR's post: http:// 
www.artima.com/weblogs/viewpost.jsp?thread=214235) that the slowdown  
was 2x in a single threaded application (which couldn't be due to  
lock contention), it must be due to lock overhead (unless the  
programming was otherwise faulty or there is something else about  
locks that I don't know about - Martin?). Hence I'm assuming that we  
need to reduce lock overhead. If acquiring and releasing locks (part  
of lock overhead) is a simple context switch (and I don't doubt you  
here), then the only remaining thing to optimize is memory operations  
related to lock objects.


 Since we're using an asynch message queue for the synch-server, it
 sounds like a standard lock-free algorithm.

 You lost me here. What are you trying to achieve? It's not the lock
 that people complain about, but that Python runs serially most
 of the time.

http://en.wikipedia.org/wiki/Lock-free_and_wait- 
free_algorithms#The_lock-free_approach

Specifically, i'm trying to achieve the approach using a deposit  
request.


 I think I neglected to mention that the locking would still need  
 to be
 more fine grained - perhaps only do the context switch around  
 refcounts
 (and the other places where the GIL is critical).

 I think this is the point where I need to say good luck implementing
 it.

I don't mean to be unhelpful. Its just that this discussion started  
because people (not me - although I would definitely benefit) showed  
interest in removing the GIL.

 Well, my interpretation of the current problem is that removing  
 the GIL
 has not been productive because of problems with lock contention on
 multi-core machines.

 My guess is that this interpretation is wrong. It was reported that
 there was a slowdown by a factor of 2 in a single-threaded  
 application.
 That can't be due to lock contention.

I agree with your point Martin (see my analysis above). Regarding  
lock contention: I'm guessing that if single threaded applications  
are so badly affected, then the cumulative overhead on multithreaded  
applications will be even worse. So we need to reduce the overhead.  
But then since all Python code runs under the GIL - which is a pretty  
coarse lock, we have to make the new locking more fine-grained (which  
is what I think the original patch by Greg Stein did). I'm also  
guessing that if you do that then for each refcount you're going to  
have to acquire a lock... which happens *very* frequently (and I  
think by your earlier responses you concur). So that means anytime  
multiple threads try to access the same object, they will need to do  
an incref/decref. e.g. If you access a global variable inside a for- 
loop from multiple threads.

 If we can somehow guarantee all GC operations (which is why the  
 GIL is
 needed in the first place)

 No, unless we disagree on what a GC operation is.

Ok. Other people know more about the specifics of the GIL than I do.  
However, the main issue with removing the GIL seems to be the  
reference counting algorithm. That is what I was alluding to. In any  
case, it is not relevant for the rest of the discussion.

regards,
Prateek
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Martin v. Löwis
 http://www.artima.com/weblogs/viewpost.jsp?thread=214235) that the
 slowdown was 2x in a single threaded application (which couldn't be due
 to lock contention), it must be due to lock overhead (unless the
 programming was otherwise faulty or there is something else about locks
 that I don't know about - Martin?). Hence I'm assuming that we need to
 reduce lock overhead. If acquiring and releasing locks (part of lock
 overhead) is a simple context switch (and I don't doubt you here), then
 the only remaining thing to optimize is memory operations related to
 lock objects.

I think you are putting too many assumptions on top of each other.
It might also have been that the locks in the slow implementation
were too fine-grained, and that some performance could have been
regained by making them coarser again.

 You lost me here. What are you trying to achieve? It's not the lock
 that people complain about, but that Python runs serially most
 of the time.
 
 http://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms#The_lock-free_approach

The asynchronous model assumes that the sender can continue to process
data without needing a reply. This is not true for the Python threading
model: if the thread needs access to some data structure, it really
needs to wait for the result of that access, because that's the
semantics of the operations.

 Specifically, i'm trying to achieve the approach using a deposit request.

For that to work, you need to produce a list of requests that can be
processed asynchronously. I can't see any in the Python interpreter.

 I'm also guessing that if
 you do that then for each refcount you're going to have to acquire a
 lock... which happens *very* frequently (and I think by your earlier
 responses you concur).

In that it occurs frequently - not in that you have to acquire a lock
to modify the refcount. You don't.

 Ok. Other people know more about the specifics of the GIL than I do.
 However, the main issue with removing the GIL seems to be the reference
 counting algorithm. 

It isn't. Reference counting could be done easily without the GIL.
It's rather the container objects, and the global variables, that
need protection.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Adam Olsen
On 9/13/07, Hrvoje Nikšić [EMAIL PROTECTED] wrote:
 On Thu, 2007-09-13 at 13:15 +0200, Martin v. Löwis wrote:
   To put it another way, would it actually matter if the reference
   counts for such objects became hopelessly wrong due to non-atomic
   adjustments?
 
  If they drop to zero (which may happen due to non-atomic adjustments),
  Python will try to release the static memory, which will crash the
  malloc implementation.

 More precisely, Python will call the deallocator appropriate for the
 object type.  If that deallocator does nothing, the object continues to
 live.  Such objects could also start out with a refcount of sys.maxint
 or so to ensure that calls to the no-op deallocator are unlikely.

 The part I don't understand is how Python would know which objects are
 global/static.  Testing for such a thing sounds like something that
 would be slower than atomic incref/decref.

I've explained my experiments here:
http://www.artima.com/forums/flat.jsp?forum=106thread=214235start=30msRange=15#279978

Basically though, atomic incref/decref won't work.  Once you've got
two threads modifying the same location the costs skyrocket.  Even
without being properly atomic you'll get the same slowdown on x86
(who's cache coherency is fairly strict.)

The only two options are:
A) Don't modify an object on every incref/decref.  Deletion must be
delayed.  This lets you share (thread-safe) objects.
B) Don't share *any* objects.  This is a process model (even if
they're lightweight like erlang).  For the near future, it's much
easier to do this using real processes though.

Threading is much more powerful, but it remains to be proven that it
can be done efficiently.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Jason Orendorff
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote:
 1. Use message passing and transactions.  [...]
 2. Do it perl style. [...]
 3. Come up with an elegant way of handling multiple python processes. [...]
 4. Remove the GIL, use transactions for python objects, [...]

The SpiderMonkey JavaScript engine takes a very different approach,
described here:
http://developer.mozilla.org/en/docs/SpiderMonkey_Internals:_Thread_Safety

The SpiderMonkey C API threading model should sound familiar:  C code
can assume that simple operations, like dictionary lookups, are atomic
and thread-safe.  C code must explicitly JS_SuspendRequest() before
doing blocking I/O or number-crunching (just like
Py_BEGIN_ALLOW_THREADS).  The main difference is that SpiderMonkey's
requests are not mutually exclusive, the way the GIL is.

SpiderMonkey does fine-grained locking for mutable objects to avoid
race conditions.  The clever bit is that SpiderMonkey's per-object
locking does *not* require a context switch or even an atomic
instruction, in the usual case where an object is *not* shared among
threads.  (Programs that embed SpiderMonkey therefore run faster if
they manage to ensure that threads share relatively few mutable
objects.  JavaScript doesn't have modules.)

Suppose Python went this route.  There would still have to be a
stop-the-world global lock, because the cycle collector won't work
if other threads are going about changing pointers.  (SpiderMonkey's
GC does the same thing.)  Retaining such a lock has another advantage:
this change could be completely backward-compatible to extensions.
Just use this global lock as the GIL when entering a non-thread-safe
extension (all existing extensions would be considered
non-thread-safe).

This means non-thread-safe extensions would be hoggish (but not much
worse than they are already!).  Making an existing extension
thread-safe would require some thought, but it wouldn't be terribly
hard.  In the simplest cases, the extension writer could just add a
flag to the type saying ok, I'm thread-safe.

Refcounting is another major issue.  SpiderMonkey uses GC instead.
CPython would need to do atomic increfs/decrefs.  (Deferred
refcounting could mitigate the cost.)

The main drawback (aside from the amount of work) is the patent.
SpiderMonkey's license grants a worldwide, royalty-free license, but
not under the Python license.  I think this could be wrangled, if the
technical approach looks worthwhile.

-j
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Adam Olsen
On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote:


 On 9/13/07, Adam Olsen [EMAIL PROTECTED] wrote:
 
  Basically though, atomic incref/decref won't work.  Once you've got
  two threads modifying the same location the costs skyrocket.  Even
  without being properly atomic you'll get the same slowdown on x86
  (who's cache coherency is fairly strict.)


 I'm a bit skeptical of the actual costs of atomic incref. For there to be
 contention, you would need to have to be modifying the same memory location
 at the exact same time. That seems unlikely to ever happen. We can't bank on
 it never happening, but an occasionally expensive operation is ok. After
 all, it's occasional.

That was my initial expectation too.  However, the incref *is* a
modification.  It's not simply an issue of the exact same time, but
anything that causes the cache entries to bounce back and forth and
delay the rest of the pipeline.  If you have a simple loop like for i
in range(count): 1.0+n, then the 1.0 literal will get shared between
threads, and the refcount will get hammered.

Is it reasonable to expect that much sharing?  I think it is.
Literals are an obvious example, but there's also configuration data
passed between threads.  Pystone seems to have enough sharing to kill
performance.  And after all, isn't sharing the whole point (even in
the definition) of threads?

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/13/07, Jason Orendorff [EMAIL PROTECTED] wrote:

 On 9/13/07, Justin Tulloss [EMAIL PROTECTED] wrote:
  1. Use message passing and transactions.  [...]
  2. Do it perl style. [...]
  3. Come up with an elegant way of handling multiple python processes.
 [...]
  4. Remove the GIL, use transactions for python objects, [...]

 The SpiderMonkey JavaScript engine takes a very different approach,
 described here:
 http://developer.mozilla.org/en/docs/SpiderMonkey_Internals:_Thread_Safety


This is basically the same as what perl does, as far as I understand it.
There are differences, but they're not that substantial. It's basically the
idea of keeping all state separate and treating global access as a special
case. I think this is a pretty solid approach, since globals shouldn't be
accessed that often. What we would want to do differently is make sure that
read-only globals can be cheaply accessed from any thread. Otherwise we lose
the performance benefit of having them in the first place.

Refcounting is another major issue.  SpiderMonkey uses GC instead.
 CPython would need to do atomic increfs/decrefs.  (Deferred
 refcounting could mitigate the cost.)


This is definitely something to think about. I don't really have an answer
straight off, but there are several things we could try.

The main drawback (aside from the amount of work) is the patent.
 SpiderMonkey's license grants a worldwide, royalty-free license, but
 not under the Python license.  I think this could be wrangled, if the
 technical approach looks worthwhile.


I'm not sure this is an issue. It's not like we would be using the code,
just the patented algorithm. Any code we wrote to implement the algorithm
would of course be covered under the python license. I'm not a legal guy
though.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/13/07, Adam Olsen [EMAIL PROTECTED] wrote:


 Basically though, atomic incref/decref won't work.  Once you've got
 two threads modifying the same location the costs skyrocket.  Even
 without being properly atomic you'll get the same slowdown on x86
 (who's cache coherency is fairly strict.)




I'm a bit skeptical of the actual costs of atomic incref. For there to be
contention, you would need to have to be modifying the same memory location
at the exact same time. That seems unlikely to ever happen. We can't bank on
it never happening, but an occasionally expensive operation is ok. After
all, it's occasional.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Greg Ewing
Martin v. Löwis wrote:

 Now we are getting into details: you do NOT have to lock
 an object to modify its reference count. An atomic
 increment/decrement operation is enough.

I stand corrected. But if it were as simple as that,
I think it would have been done by now. I got the
impression that this had already been tried, and it
was still too slow.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread skip

Hrvoje More precisely, Python will call the deallocator appropriate for
Hrvoje the object type.  If that deallocator does nothing, the object
Hrvoje continues to live.  Such objects could also start out with a
Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op
Hrvoje deallocator are unlikely.

Maybe sys.maxint/2?  You'd hate for the first incref to invoke the
deallocator even if it was a no-op.  

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Jon Ribbens
On Thu, Sep 13, 2007 at 06:38:05PM -0500, [EMAIL PROTECTED] wrote:
 Hrvoje More precisely, Python will call the deallocator appropriate for
 Hrvoje the object type.  If that deallocator does nothing, the object
 Hrvoje continues to live.  Such objects could also start out with a
 Hrvoje refcount of sys.maxint or so to ensure that calls to the no-op
 Hrvoje deallocator are unlikely.
 
 Maybe sys.maxint/2?  You'd hate for the first incref to invoke the
 deallocator even if it was a no-op.  

I do believe I already suggested that ;-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Greg Ewing
Jon Ribbens wrote:
 To put it another way, would it actually matter if the reference
 counts for such objects became hopelessly wrong due to non-atomic
 adjustments?

Again, it would cost time to check whether you could
get away with doing non-atomic refcounting.

If you're thinking that no check would be needed because
only things like True, False and None would be shared
between threads, that's quite wrong. If the threads
are to communicate at all, they need to share some
kind of data somewhere.

Also keep in mind that there is one case of wrong
refcounting that would be distastrous, which is the
case where the refcount becomes zero prematurely.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Greg Ewing
[EMAIL PROTECTED] wrote:
 what if ... we use atomic test-and-set to
 handle reference counting (with a lock for those CPU architectures where we
 haven't written the necessary assembler fragment), then implement a lock for
 each mutable type and another for global state (thread state, interpreter
 state, etc)?

Could be worth a try. A first step might be to just implement
the atomic refcounting, and run that single-threaded to see
if it has terribly bad effects on performance.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Greg Ewing
Prateek Sureka wrote:
 Naturally, we need to make the locking more  
 fine-grained to resolve this. Hopefully we can do so in a way that  
 does not increase the lock overhead (hence my suggestion for a lock  
 free approach using an asynch queue and a core as dedicated server).

What you don't seem to see is that this would have
no less overhead, and probably a lot *more*, than
a mutex or other standard synchronisation mechanism.
Certainly a lot more than an atomic instruction for
the incref/decref.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Greg Ewing
Jason Orendorff wrote:
 The clever bit is that SpiderMonkey's per-object
 locking does *not* require a context switch or even an atomic
 instruction, in the usual case where an object is *not* shared among
 threads.

How does it tell whether an object is shared between
threads? That sounds like the really clever bit to me.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiem! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Tennessee Leeuwenburg
Pardon me for talking with no experience in such matters, but...

Okay, incrementing a reference counter is atomic, therefore the cheapest
possible operation. Is it possible to keep reference counting atomic in a
multi-thread model?

Could you do the following... let's consider two threads, A and B. Each
time an object is created, a reference count is created in both A and B.
Let's suppose A has a real reference and B has no reference really.
Couldn't the GC check two reference registers for a reference count? The
object would then be cleaned up only if both registers were 0.

To exploit multiple CPUs, you could have two persistent Python processes on
each CPU with its own mini-GIL. Object creation would then involve a call to
each process to create the reference and GC would involve checking each
process to see what their count is. However, it would mean that within each
process, threads could create additional references or remove references in
an atomic way.

In a single-CPU system, this would be the same cost as currently, since I
think that situation would devolve to having just one place to check for
references. This seems to mean that it is the case that it would be no more
expensive for a single-CPU system.

In a two-CPU system, I'm no expertise on the actual call overheads of object
creation and garbage collection, but logically it would double the effort of
object creation and destruction (all such operations now need to occur on
both processes) but would keep reference increments and decrements atomic.

Once again, I'm really sorry if I'm completely off-base since I have never
done any actual coding in this area, but I thought I'd make the suggestion
just in case it happened to have relevance.

Thanks,
-Tennessee
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-13 Thread Justin Tulloss
On 9/13/07, Greg Ewing [EMAIL PROTECTED] wrote:

 Jason Orendorff wrote:
  The clever bit is that SpiderMonkey's per-object
  locking does *not* require a context switch or even an atomic
  instruction, in the usual case where an object is *not* shared among
  threads.

 How does it tell whether an object is shared between
 threads? That sounds like the really clever bit to me.


If you look at the article, they have a code sample.

Basically a global is owned by the first thread that touches it. That
thread can do whatever it wants with that global. If another thread wants to
touch the global, it locks everything to do so.

This is a pretty good idea except that in Python there are so many globals
that all threads benefit from having access to. Luckily, except for their
reference counts, they're mostly read-only. Therefore, if we can work out
this reference count, we can probably use a similar concept.

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-12 Thread Martin v. Löwis
 But this has been raised before, and was rejected as not worth the  
 amount of work that would be required to achieve it.

In my understanding, there is an important difference between
it was rejected, and it was not done.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-12 Thread Jason Orendorff
On 9/12/07, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Now we are getting into details: you do NOT have to lock
 an object to modify its reference count. An atomic
 increment/decrement operation is enough.

One could measure the performance hit incurred by using atomic
operations for refcounting by hacking a few macros -- right?

Deferred reference counting (DRC for short) might help...
http://www.memorymanagement.org/glossary/d.html#deferred.reference.counting

I can explain a little more how this works if anyone's interested.
DRC basically eliminates reference counting for locals--that is,
pointers from the stack to an object.  An object becomes refcounted
only when some other object gets a pointer to it.  The drawback is
that destructors aren't called quite as promptly as in true
refcounting.  (They're still called in the right order,
though--barring cycles, an object's destructor is called before its
children's destructors.)

What counts as stack is up to the implementation; typically it means
the C stack.  This could be used to eliminate most refcounting in C
code, although listobject.c would keep it.  The amount of per-platform
assembly code needed is surprisingly small (and won't change, once
you've written it--the Tamarin JavaScript VM does this).

You could go further and treat the Python f_locals and interpreter
stack as stack. I think this would eliminate all refcounting in the
interpreter.  Of course, it complicates matters that f_locals is
actually an object visible from Python.

Just a thought, not a demand, please don't flame me,
-j
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-12 Thread skip

Brett We should probably document where all of these globals lists are
Brett instead of relying on looking for all file level static
Brett declarations or something.

I smell a wiki page.

Skip

Brett Or would there be benefit to moving things like this to the
Brett interpreter struct so that threads within a single interpreter
Brett call are locked but interpreters can act much more independently?

Would that simplify things all that much?  All containers would probably
still rely on the GIL.  Also, all objects rely on it to do reference count
increment/decrement as I recall.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-12 Thread skip

Martin Now we are getting into details: you do NOT have to lock an
Martin object to modify its reference count. An atomic
Martin increment/decrement operation is enough.

Implemented in asm I suspect?  For common CPUs this could just be part of
the normal Python distribution.  For uncommon ones this could use a lock
until someone gets around to writing the necessary couple lines of
assembler.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-12 Thread Prateek Sureka
I was reading GvR's post on this and came up with a theory on how to  
tackle the problem.


I ended up putting it in a blog post.

http://www.brainwavelive.com/blog/index.php?/archives/12-Suggestion- 
for-removing-the-Python-Global-Interpreter-Lock.html

What do you think?

Prateek

On Sep 12, 2007, at 9:08 PM, [EMAIL PROTECTED] wrote:



Martin Now we are getting into details: you do NOT have to  
lock an

Martin object to modify its reference count. An atomic
Martin increment/decrement operation is enough.

Implemented in asm I suspect?  For common CPUs this could just be  
part of
the normal Python distribution.  For uncommon ones this could use a  
lock

until someone gets around to writing the necessary couple lines of
assembler.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/ 
surekap%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-12 Thread Martin v. Löwis
 What do you think?

I think what you are describing is the situation of today,
except in a less-performant way. The kernel *already*
implements such a synchronization server, except that
all CPUs can act as such. You write

Since we are guaranteeing that synchronized code is running on a single
core, it is the equivalent of a lock at the cost of a context switch.

This is precisely what a lock costs today: a context switch.

Since the Python interpreter is synchronized all of the time, it
would completely run on the synchronization server all of the
time. As you identify, that single CPU might get overloaded, so
your scheme would give no benefits (since Python code could never
run in parallel), and only disadvantages (since multiple Python
interpreters today can run on multiple CPUs, but could not
anymore under your scheme).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Justin Tulloss
Hi,

I had a whole long email about exactly what I was doing, but I think I'll
get to the point instead. I'm trying to implement a python concurrency API
and would like to use cpython to do it. To do that, I would like to remove
the GIL.

So, since I'm new to interpreter hacking, some help would be appreciated.
I've listed what I think the GIL does; if you guys could add to this list or
refine it, I would very much appreciate it.

Roles of the GIL:
1. Some global interpreter state/modules are protected (where are these
globals at?)
2. When writing C extensions I can change the state of my python object
without worrying about synchronization
3. When writing C extensions I can change my own internal C state without
worrying about synchronization (unless I have other, non-python threads
running)

Does anyone know of a place where the GIL is required when not operating on
a python object? It seems like this would never happen, and would make
replacing the GIL somewhat easier.

I've only started looking at the code recently, so please forgive my
naivety. I'm still learning how the interpreter works on a high level, let
alone all the nitty gritty details!

Thanks,
Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Martin v. Löwis
 1. Some global interpreter state/modules are protected (where are these
 globals at?)

It's the interpreter and thread state itself (pystate.h), for the thread
state, also _PyThreadState_Current. Then there is the GC state, in
particular generations. There are various caches and counters also.

 2. When writing C extensions I can change the state of my python object
 without worrying about synchronization
 3. When writing C extensions I can change my own internal C state
 without worrying about synchronization (unless I have other, non-python
 threads running)
4. The builtin container types are protected by the GIL, and various
   other builtin objects
5. Reference counting is protected by the GIL
6. PyMalloc is protected by the GIL.

 Does anyone know of a place where the GIL is required when not operating
 on a python object?

See 6 above, also (obviously) 1.

 I've only started looking at the code recently, so please forgive my
 naivety. I'm still learning how the interpreter works on a high level,
 let alone all the nitty gritty details!

Good luck!

Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Justin Tulloss
On 9/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote:

  1. Some global interpreter state/modules are protected (where are these
  globals at?)

 It's the interpreter and thread state itself (pystate.h), for the thread
 state, also _PyThreadState_Current. Then there is the GC state, in
 particular generations. There are various caches and counters also.


Caches seem like they definitely might be a problem. Would you mind
expanding on this a little? What gets cached and why?

Justin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread skip

Justin Caches seem like they definitely might be a problem. Would you
Justin mind expanding on this a little? What gets cached and why?

I believe the integer free list falls into this category.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Martin v. Löwis
 It's the interpreter and thread state itself (pystate.h), for the thread
 state, also _PyThreadState_Current. Then there is the GC state, in
 particular generations. There are various caches and counters also.
 
 
 Caches seem like they definitely might be a problem. Would you mind
 expanding on this a little? What gets cached and why?

Depends on the Python version what precisely gets cached. Several types
preserve a pool of preallocated objects, to speed up allocation.
Examples are intobject.c (block_list, free_list), frameobject.c
(free_list), listobject.c (free_list), methodobject.c (free_list),
float_object.c (block_list, free_list), classobject.c (free_list).

Plus there are tons of variables caching string objects. From
classobject.c alone: getattrstr, setattrstr, delattrs, docstr,
modstr, namestr, initstr, delstr, reprstr, strstr, hashstr, eqstr,
cmpstr, getitemstr, setitemstr, delitemstr, lenstr, iterstr, nextstr,
getslicestr, setslicestr, delslicestr, __contains__, all arguments
to UNARY, UNARY_FB, BINARY, BINARY_INPLACE (e.g. instance_neg,
instance_or, instance_ior, then cmp_obj, nonzerostr, indexstr.
(admittedly, classobject.c is extreme here).

There are probably more classes which I just forgot.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Phillip J. Eby
At 10:07 AM 9/11/2007 -0500, Justin Tulloss wrote:


On 9/11/07, Martin v. Löwis 
mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote:
  1. Some global interpreter state/modules are protected (where are these
  globals at?)

It's the interpreter and thread state itself (pystate.h), for the thread
state, also _PyThreadState_Current. Then there is the GC state, in
particular generations. There are various caches and counters also.


Caches seem like they definitely might be a problem. Would you mind 
expanding on this a little? What gets cached and why?

It's not just caches and counters.  It's also every built-in type 
structure, builtin module, builtin function...  any Python object 
that's a built-in, period.  That includes things like None, True, and False.

Caches would include such things as the pre-created integers -100 
through 255, the 1-byte character strings for chr(0)-chr(255), and 
the interned strings cache, to name a few.

Most of these things I've mentioned are truly global, and not 
specific to an individual interpreter.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Brett Cannon
On 9/11/07, Martin v. Löwis [EMAIL PROTECTED] wrote:
  It's the interpreter and thread state itself (pystate.h), for the thread
  state, also _PyThreadState_Current. Then there is the GC state, in
  particular generations. There are various caches and counters also.
 
 
  Caches seem like they definitely might be a problem. Would you mind
  expanding on this a little? What gets cached and why?

 Depends on the Python version what precisely gets cached. Several types
 preserve a pool of preallocated objects, to speed up allocation.
 Examples are intobject.c (block_list, free_list), frameobject.c
 (free_list), listobject.c (free_list), methodobject.c (free_list),
 float_object.c (block_list, free_list), classobject.c (free_list).

 Plus there are tons of variables caching string objects. From
 classobject.c alone: getattrstr, setattrstr, delattrs, docstr,
 modstr, namestr, initstr, delstr, reprstr, strstr, hashstr, eqstr,
 cmpstr, getitemstr, setitemstr, delitemstr, lenstr, iterstr, nextstr,
 getslicestr, setslicestr, delslicestr, __contains__, all arguments
 to UNARY, UNARY_FB, BINARY, BINARY_INPLACE (e.g. instance_neg,
 instance_or, instance_ior, then cmp_obj, nonzerostr, indexstr.
 (admittedly, classobject.c is extreme here).

 There are probably more classes which I just forgot.

We should probably document where all of these globals lists are
instead of relying on looking for all file level static declarations
or something.  Or would there be benefit to moving things like this to
the interpreter struct so that threads within a single interpreter
call are locked but interpreters can act much more independently?

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Martin v. Löwis
 It's not just caches and counters.  It's also every built-in type
 structure, builtin module, builtin function...  any Python object that's
 a built-in, period.  That includes things like None, True, and False.

Sure - but those things don't get modified that often, except for their
reference count. In addition, they are objects, and Justin seems to
believe that things are easier if they are objects.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread James Y Knight
On Sep 11, 2007, at 3:30 PM, Brett Cannon wrote:
 We should probably document where all of these globals lists are
 instead of relying on looking for all file level static declarations
 or something.  Or would there be benefit to moving things like this to
 the interpreter struct so that threads within a single interpreter
 call are locked but interpreters can act much more independently?

This would be nice. It would be really nice if python was embeddable  
more like TCL: separate interpreters really are separate, and don't  
share state. That means basically everything has to be stored in an  
interp-specific data structure, not in static variables.

But this has been raised before, and was rejected as not worth the  
amount of work that would be required to achieve it. (it's certainly  
not worth it enough for *me* to do the work, so I can't blame anyone  
else for making the same determination.)

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Martin v. Löwis
 We should probably document where all of these globals lists are
 instead of relying on looking for all file level static declarations
 or something.

I'm not sure what would be gained here, except for people occasionally
(i.e. every three years) asking how they can best get rid of the GIL.

 Or would there be benefit to moving things like this to
 the interpreter struct so that threads within a single interpreter
 call are locked but interpreters can act much more independently?

The multiple interpreter feature doesn't quite work, and likely
won't for a foreseeable future; specifically, objects can easily
leak across interpreters. That's actually not a problem for immutable
objects, like the strings, but here come the global objects into
play which PJE mentions: types, including exceptions. Making them
per-interpreter would probably break quite some code.

As for the cached strings - it would be easy to make a global table
of these, e.g. calling them _PyS__init__, _PyS__add__, and so on.
These could be initialized at startup, simplifying the code that
uses them because they don't have to worry about failures.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Greg Ewing
Phillip J. Eby wrote:
 It's also every built-in type 
 structure, builtin module, builtin function...  any Python object 
 that's a built-in, period.

Where built-in in this context means anything implemented
in C (i.e. it includes extension modules).

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing the GIL (Me, not you!)

2007-09-11 Thread Greg Ewing
Martin v. Löwis wrote:
 Sure - but those things don't get modified that often, except for their
 reference count.

The reference count is the killer, though -- you have
to lock the object even to do that. And it happens
a LOT, to all objects, including immutable ones.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com