Re: Lockless algorithms in python (Nothing to do with GIL)

2010-07-06 Thread Zac Burns
I'm not an expert on this, but wouldn't it be more dependent on the platform
than python version? Perhaps it is Windows 7 that is very slow. Perhaps my
processor architecture. Not sure...

Here are some for 3.1.2x64

 import timeit
 timeit.timeit('Lock()', 'from threading import Lock')
1.4162585386092708
 timeit.timeit('dict()', 'from threading import Lock')
0.2730348901369162
 timeit.timeit('list()', 'from threading import Lock')
0.1719480219512306

-Zac

On Fri, Jul 2, 2010 at 1:26 PM, Antoine Pitrou solip...@pitrou.net wrote:

 On Mon, 28 Jun 2010 16:46:41 -0700
 Zac Burns zac...@gmail.com wrote:
  In my experience it is far more expensive to allocate a lock in python
 then
  it is the types that use them. Here are some examples:
 
   timeit.timeit('Lock()', 'from threading import Lock')
  1.4449114807669048
 
   timeit.timeit('dict()')
  0.2821554294221187
 
   timeit.timeit('list()')
  0.17358153222312467

 I'm not sure what Python version on what machine you are using, but
 here (Python 2.7):

  timeit.timeit('Lock()', 'from threading import Lock')
 0.09944796562194824
  timeit.timeit('dict()')
 0.17817902565002441
  timeit.timeit('list()')
 0.19633007049560547
  timeit.timeit('{}')
 0.03823709487915039
  timeit.timeit('[]')
 0.05156302452087402



 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-07-02 Thread Antoine Pitrou
On Mon, 28 Jun 2010 16:46:41 -0700
Zac Burns zac...@gmail.com wrote:
 In my experience it is far more expensive to allocate a lock in python then
 it is the types that use them. Here are some examples:
 
  timeit.timeit('Lock()', 'from threading import Lock')
 1.4449114807669048
 
  timeit.timeit('dict()')
 0.2821554294221187
 
  timeit.timeit('list()')
 0.17358153222312467

I'm not sure what Python version on what machine you are using, but
here (Python 2.7):

 timeit.timeit('Lock()', 'from threading import Lock')
0.09944796562194824
 timeit.timeit('dict()')
0.17817902565002441
 timeit.timeit('list()')
0.19633007049560547
 timeit.timeit('{}')
0.03823709487915039
 timeit.timeit('[]')
0.05156302452087402



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread Ryan Kelly
On Mon, 2010-06-28 at 16:46 -0700, Zac Burns wrote:
 In my experience it is far more expensive to allocate a lock in python
 then it is the types that use them. Here are some examples:
 
  timeit.timeit('Lock()', 'from threading import Lock')
 1.4449114807669048
 
  timeit.timeit('dict()')
 0.2821554294221187
 
  timeit.timeit('list()')
 0.17358153222312467
 
 
 Granted, I know these types do not allocate a lock - but a similarly
 defined user type would need to allocate a lock to guarantee
 thread-safety.

Sure, but I think you're timing the wrong thing here.  You would only
allocate the lock relatively rarely - it's the overhead of *acquiring*
the lock that's the real problem.

   r...@durian:~$ python -m timeit -s from threading import Lock; l = Lock() 
l.acquire(); l.release()
   100 loops, best of 3: 0.378 usec per loop

Compare to one of my favourite dict methods, setdefault:

   r...@durian:~$ python -m timeit -s d = dict() 
d.setdefault('hello','world')
   1000 loops, best of 3: 0.189 usec per loop


In the same ballpark really.

 I have not done a great deal of research into lockless algorithms
 thusfar, but would it be worthwhile to investigate implementing some
 of them in python to optimize performance critical sections of code?
 
 Many lockless algorithms that I have looked at thusfar require a
 Compare and Swap operation. Does python have an equivalent to this?
 Is python high level enough that it has something better than this or
 it simply doesn't need it?

I've managed to avoid locking in some cases by using dict.setdefault()
as a kind of atomic test-and-set operation.  It's not a full
compare-and-swap but you could implement a simple locking scheme using
it like this:


  class PretendLock:
  def __init__(self):
  self.d = dict()
  def acquire(self):
  myid = threading.currentThread().ident
  while self.d.setdefault(owner,myid) != myid:
  pass
  def release(self):
  del self.d[owner]


This depends on some peculiarities of the CPython implementation that
aren't guaranteed by the language spec - namely that thread switches
can't happen while executing C code, that dict.setdefault is implemented
in C, and that is can calculate the hash of a string key without calling
back out to python code.

Usually, it's not worth it.  The one time I use this regularly is for
lazy object initialisation, e.g. something like this:

  def get_my_object(key):
  try:
 return objects[key]
  except KeyError:
 return objects.setdefault(key,create_my_object(key))

If two threads happen to try initialising the object at the same time,
only one will wind up in the dict and being used; the other will be
discarded and garbage-collected.



  Cheers,

 Ryan




-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
r...@rfk.id.au|  http://www.rfk.id.au/ramblings/gpg/ for details



signature.asc
Description: This is a digitally signed message part
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread Zac Burns

 Sure, but I think you're timing the wrong thing here.  You would only
 allocate the lock relatively rarely - it's the overhead of *acquiring*
 the lock that's the real problem.

   r...@durian:~$ python -m timeit -s from threading import Lock; l =
 Lock() l.acquire(); l.release()
   100 loops, best of 3: 0.378 usec per loop

 Compare to one of my favourite dict methods, setdefault:

   r...@durian:~$ python -m timeit -s d = dict()
 d.setdefault('hello','world')
   1000 loops, best of 3: 0.189 usec per loop


 In the same ballpark really.






 I've managed to avoid locking in some cases by using dict.setdefault()
 as a kind of atomic test-and-set operation.  It's not a full
 compare-and-swap but you could implement a simple locking scheme using
 it like this:


  class PretendLock:
  def __init__(self):
  self.d = dict()
  def acquire(self):
  myid = threading.currentThread().ident
  while self.d.setdefault(owner,myid) != myid:
  pass
  def release(self):
  del self.d[owner]


 This depends on some peculiarities of the CPython implementation that
 aren't guaranteed by the language spec - namely that thread switches
 can't happen while executing C code, that dict.setdefault is implemented
 in C, and that is can calculate the hash of a string key without calling
 back out to python code.

 Usually, it's not worth it.  The one time I use this regularly is for
 lazy object initialisation, e.g. something like this:

  def get_my_object(key):
  try:
 return objects[key]
  except KeyError:
 return objects.setdefault(key,create_my_object(key))

 If two threads happen to try initialising the object at the same time,
 only one will wind up in the dict and being used; the other will be
 discarded and garbage-collected.


Thanks Ryan,

Agreed, I was timing the wrong thing - but largely because I wasn't sure
what to time against. setdefault certainly seems to be a candidate for
something to time. Even so, setdefault wins.

Though, I wouldn't want to use PretendLock in this case... it doesn't seem
that having the spin loop would justify reducing reducing the lock
allocation overhead in a no-contention scenario.

Yes, get_my_object is a good trick. I did just that to avoid instantiating
extra locks in one scenario.

--
Zachary Burns
(407)590-4814
Aim - Zac256FL
Production Engineer
Zindagi Games
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread Ryan Kelly
On Mon, 2010-06-28 at 18:27 -0700, Zac Burns wrote:


 
 I've managed to avoid locking in some cases by using
 dict.setdefault() as a kind of atomic test-and-set operation.
 It's not a full compare-and-swap but you could implement a
 simple locking scheme using it like this:
 
 
  class PretendLock:
  def __init__(self):
  self.d = dict()
  def acquire(self):
  myid = threading.currentThread().ident
  while self.d.setdefault(owner,myid) != myid:
  pass
  def release(self):
  del self.d[owner]
 
 
 This depends on some peculiarities of the CPython
 implementation that aren't guaranteed by the language spec -
 namely that thread switches can't happen while executing C
 code, that dict.setdefault is implemented in C, and that is
 can calculate the hash of a string key without calling back
 out to python code.

 
 Thanks Ryan,
 
 Agreed, I was timing the wrong thing - but largely because I wasn't
 sure what to time against. setdefault certainly seems to be a
 candidate for something to time. Even so, setdefault wins.
 
 Though, I wouldn't want to use PretendLock in this case... it doesn't
 seem that having the spin loop would justify reducing reducing the
 lock allocation overhead in a no-contention scenario.


Oh totally - nobody reading this should even consider using that
PretendLock class for anything.  It's just an example of how to use
setdefault as a test-and-set kind of operator.


  Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
r...@rfk.id.au|  http://www.rfk.id.au/ramblings/gpg/ for details



signature.asc
Description: This is a digitally signed message part
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread sturlamolden

  Many lockless algorithms that I have looked at thusfar require a
  Compare and Swap operation. Does python have an equivalent to this?
  Is python high level enough that it has something better than this or
  it simply doesn't need it?

Python does have a GIL, and contrary to the title, this has a lot to
do with the GIL. Specifically, any set of operations protected by the
GIL are atomic in the CPython interpreter.

We can therefore get global synchronization 'for free', allowing 'lock
free' data structures that avoid using threading.Lock, and replaces it
with, well, nothing. :)

How to do it (it takes tiny amounts of C or Cython):

Take at look at page 14:

   http://www.dabeaz.com/python/UnderstandingGIL.pdf

The critical part is the variable _Py_Ticker in Python's source file
ceval.c, which is declared volatile int. As it is not declared static
it is a global variable, and we can actually manipulate its value from
a C extension:

   extern volatile int _Py_Ticker; // evil!!!

Now imagine temporarily setting _Py_Ticker to some ridiculous high
value like 0x7fff. That locks out all other threads, practically
forever, until we restore _Py_Ticker back to it's original value.

What this gives us is global synchronization for free! The GIL is
there anyway, so it adds no additional overhead.

By messing with _Py_Ticker we can force blocks of Python code to
become atomic in the interpreter. With this hack we can therefore
create data structures that are lock free relative to the CPython
interpreter. It is evil, but very cheap compared to using
threading.Lock. In fact it has exactly zero overhead: the GIL is there
anyway, and it is already acquired!

The safest placement of a _Py_Ticker hack is in a context manager. We
have to make sure it gets restored to a sane value, even in presence
of an exception.

Enjoy :)



P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would
do the trick, you are actually not that smart. It will lock our own
thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we
have to attack _Py_Ticker from C.

P.P.S. And when we have taken control over the GIL, don't do anything
that might release it from C (like reading from a file). We cannot
afford to loose it :)





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread sturlamolden

  Many lockless algorithms that I have looked at thusfar require a
  Compare and Swap operation. Does python have an equivalent to this?
  Is python high level enough that it has something better than this or
  it simply doesn't need it?

Python does have a GIL, and contrary to the title, this has a lot to
do with the GIL. Specifically, any set of operations protected by the
GIL are atomic in the CPython interpreter.

We can therefore get global synchronization 'for free', allowing 'lock
free' data structures that avoid using threading.Lock, and replaces it
with, well, nothing. :)

How to do it (it takes tiny amounts of C or Cython):

Take at look at page 14:

   http://www.dabeaz.com/python/UnderstandingGIL.pdf

The critical part is the variable _Py_Ticker in Python's source file
ceval.c, which is declared volatile int. As it is not declared static
it is a global variable, and we can actually manipulate its value from
a C extension:

   extern volatile int _Py_Ticker; // evil!!!

Now imagine temporarily setting _Py_Ticker to some ridiculous high
value like 0x7fff. That locks out all other threads, practically
forever, until we restore _Py_Ticker back to it's original value.

What this gives us is global synchronization for free! The GIL is
there anyway, so it adds no additional overhead.

By messing with _Py_Ticker we can force blocks of Python code to
become atomic in the interpreter. With this hack we can therefore
create data structures that are lock free relative to the CPython
interpreter. It is evil, but very cheap compared to using
threading.Lock. In fact it has exactly zero overhead: the GIL is there
anyway, and it is already acquired!

The safest placement of a _Py_Ticker hack is in a context manager. We
have to make sure it gets restored to a sane value, even in presence
of an exception.

Enjoy :)



P.S. If you get smart and think sys.setcheckinterval(sys.maxint) would
do the trick, you are actually not that smart. It will lock our own
thread out with a Poisson rate of 1./sys.getcheckinterval(). Thus we
have to attack _Py_Ticker from C.

P.P.S. And when we have taken control over the GIL, don't do anything
that might release it from C (like reading from a file). We cannot
afford to loose it :)






-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread Ryan Kelly
On Mon, 2010-06-28 at 19:45 -0700, sturlamolden wrote:
   Many lockless algorithms that I have looked at thusfar require a
   Compare and Swap operation. Does python have an equivalent to this?
   Is python high level enough that it has something better than this or
   it simply doesn't need it?
 
 Python does have a GIL, and contrary to the title, this has a lot to
 do with the GIL. Specifically, any set of operations protected by the
 GIL are atomic in the CPython interpreter.
 
 We can therefore get global synchronization 'for free', allowing 'lock
 free' data structures that avoid using threading.Lock, and replaces it
 with, well, nothing. :)
 
 How to do it (it takes tiny amounts of C or Cython):
 
 Take at look at page 14:
 
http://www.dabeaz.com/python/UnderstandingGIL.pdf
 
 The critical part is the variable _Py_Ticker in Python's source file
 ceval.c, which is declared volatile int. As it is not declared static
 it is a global variable, and we can actually manipulate its value from
 a C extension:
 
extern volatile int _Py_Ticker; // evil!!!
 
 Now imagine temporarily setting _Py_Ticker to some ridiculous high
 value like 0x7fff. That locks out all other threads, practically
 forever, until we restore _Py_Ticker back to it's original value.
 
 What this gives us is global synchronization for free! The GIL is
 there anyway, so it adds no additional overhead.

Very interesting idea.  Will it work if accessed through ctypes?

   ticker = ctypes.c_int.in_dll(ctypes.pythonapi,_Py_Ticker)
   ticker.value = 0x7fff

Or does ctypes muck with the GIL in a way that would break this idea?


  Ryan


-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
r...@rfk.id.au|  http://www.rfk.id.au/ramblings/gpg/ for details



signature.asc
Description: This is a digitally signed message part
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Lockless algorithms in python (Nothing to do with GIL)

2010-06-28 Thread sturlamolden
On 29 Jun, 05:11, Ryan Kelly r...@rfk.id.au wrote:

 Very interesting idea.  Will it work if accessed through ctypes?

    ticker = ctypes.c_int.in_dll(ctypes.pythonapi,_Py_Ticker)
    ticker.value = 0x7fff

 Or does ctypes muck with the GIL in a way that would break this idea?


 ctypes.pythonapi
PyDLL 'python dll', handle 1e00 at 1ffead0

It's a PyDLL, so it should not mock with the GIL.


from contextlib import contextmanager
import ctypes
_Py_Ticker = ctypes.c_int.in_dll(ctypes.pythonapi,_Py_Ticker)

@contextmanager
def atomic():
tmp = _Py_Ticker.value
_Py_Ticker.value = 0x7fff
yield
_Py_Ticker.value = tmp - 1


Now we can do

with atomic():
# whatever
pass

Just make sure we don't call extension code that releases the GIL :)





-- 
http://mail.python.org/mailman/listinfo/python-list