Re: User class binary ops seem too slow (was re: GIL detector)

2014-08-18 Thread Steven D'Aprano
On Mon, 18 Aug 2014 00:43:58 -0400, Terry Reedy wrote:

   timeit.repeat('1+1')
 [0.04067762117549266, 0.019206152658126363, 0.018796680446902643]

I think you have been tripped up by the keyhole optimizer. I'm not 
entirely certain, but that's probably just measuring the overhead of 
evaluating the constant 2. Same applies to your other constant + 
constant tests.

This is in Python 3.3, and is suggestive:

py bc = compile('1+1', '', 'exec')
py from dis import dis
py dis(bc)
  1   0 LOAD_CONST   2 (2) 
  3 POP_TOP  
  4 LOAD_CONST   1 (None) 
  7 RETURN_VALUE 



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-18 Thread Grant Edwards
On 2014-08-17, Stefan Behnel stefan...@behnel.de wrote:
 Steven D'Aprano schrieb am 17.08.2014 um 16:21:
 I wonder whether Ruby programmers are as obsessive about
 Ruby's GIL?

 I actually wonder more whether Python programmers are really all that
 obsessive about CPython's GIL.

[...]

 Personally, I like the GIL. It helps me keep my code simpler and more
 predictable. I don't have to care about threading issues all the time and
 can otherwise freely choose the right model of parallelism that suits my
 current use case when the need arises (and threads are rarely the right
 model). I'm sure that's not just me.

Those are pretty much my feelings exactly.  I've been writing Python
apps for 15 years.  They're mostly smallish utlities, network and
serial comm stuff, a few WxWidgets and GTK apps, some IMAP, SMTP and
HTTP stuff. Many are multi-threaded, and some of the mesh data
analysis and visualization ones ran for a few 10's of minutes.  I
don't remember a single instance where the GIL was even as much as
annoying.  The GIL means that multi-threaded apps most just work and
you don't have to sprinkle mutexes all over your code the way you do
in C using pthreads.  You do sometimes need to use mutexs in Python,
only at a higher layer -- there's a whole lower layer of mutexes that
you don't need because of the GIL.

-- 
Grant Edwards   grant.b.edwardsYow! On the road, ZIPPY
  at   is a pinhead without a
  gmail.compurpose, but never without
   a POINT.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Johannes Bauer
On 17.08.2014 16:21, Steven D'Aprano wrote:
 Coincidentally after reading Armin Ronacher's criticism of the GIL in
 Python:
 
 http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/

Sure that's the right one? The article you linked doesn't mention the GIL.

 I stumbled across this GIL detector script:
 
 http://yuvalg.com/blog/2011/08/09/the-gil-detector/
 
 Running it on a couple of my systems, I get these figures:
 
 CPython 2.7: 0.8/2 cores
 CPython 3.3: 1.0/2 cores
 
 Jython 2.5:  2.3/4 cores
 CPython 2.6: 0.7/4 cores
 CPython 3.3: 0.7/4 cores

CPython 3.4: 0.9/4 cores

Cheers,
Johannes

-- 
 Wo hattest Du das Beben nochmal GENAU vorhergesagt?
 Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Chris Angelico
On Mon, Aug 18, 2014 at 12:21 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
 The day will come that even the cheapest, meanest entry-level PC will come
 standard with 8 cores and the GIL will just be an embarrassment, but today
 is not that day. I wonder whether Ruby programmers are as obsessive about
 Ruby's GIL?

I'm kinda waiting till I see tons of awesome asyncio code in the wild,
but the way I'm seeing things, the world seems to be moving toward a
model along these lines:

0) Processes get spawned for any sort of security/protection boundary.
Sandboxing Python-in-Python (or any other high level language) just
isn't worth the effort.

1) One process, in any high level language, multiplexes requests but
uses just one CPU core.

2) Something at a higher level dispatches requests between multiple
processes - eg Passenger with Apache.

So, if you want to take advantage of your eight cores, you run eight
processes, and have Apache spread the load between them. Each process
might handle a large number of concurrent requests, but all through
async I/O and a single dispatch loop. Even the use of multiple threads
seems to be dying out (despite being quite handy when lower-level
functions will release the GIL) in favour of the multiple process
model.

I'm just not sure how, with that kind of model, to have processes
interact with each other. It's fine when every request is handled
perfectly independently, but what if you want per-user state, and you
can't guarantee that one user's requests will all come to the same
process? You have to push everything through a single serialized
storage vault (probably a database), which is then going to become the
bottleneck.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Steven D'Aprano
Johannes Bauer wrote:

 On 17.08.2014 16:21, Steven D'Aprano wrote:
 Coincidentally after reading Armin Ronacher's criticism of the GIL in
 Python:
 
 http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/
 
 Sure that's the right one? The article you linked doesn't mention the GIL.

Search for global interpreter lock, there are at least three references to
it.

Okay, the post is not *specifically* criticism of the GIL, but of the design
decision which makes the GIL necessary.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Stefan Behnel
Steven D'Aprano schrieb am 17.08.2014 um 16:21:
 I wonder whether Ruby programmers are as obsessive about
 Ruby's GIL?

I actually wonder more whether Python programmers are really all that
obsessive about CPython's GIL. Sure, there are always the Loud Guys who
speak up when they feel like no-one's mentioned it for too long, but I'd
expect the vast majority to be just ok with the status quo and not think
about it most of the time. Or, well, think about it when one of the Loud
Guys takes the megaphone, but then put their thoughts back in the attic and
keep doing their daily work.

Personally, I like the GIL. It helps me keep my code simpler and more
predictable. I don't have to care about threading issues all the time and
can otherwise freely choose the right model of parallelism that suits my
current use case when the need arises (and threads are rarely the right
model). I'm sure that's not just me.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


RE: GIL detector

2014-08-17 Thread Joseph L. Casale
 I don't have to care about threading issues all the time and
 can otherwise freely choose the right model of parallelism that suits my
 current use case when the need arises (and threads are rarely the right
 model). I'm sure that's not just me.

The sound bite of a loyal Python coder:)

If it weren't for these useless threads, you wouldn't have even been able
to send that message, let alone do anything on a computer for that matter.

That generalization is a bit broad, the argument is pointless when it moves
away from a purely technical to an emotionally basis. A failure to separate the
two pigeonholes the progress of the language and hides underlying constraints
which is what Armins post was trying to bring forward.

It's naïve to think the first crack of anything is immune to refactoring.

No one said Python doesn't have any merit and I honestly don't know a
perfect language that has no need for improvement somewhere. I am sure
however that the topic is a concern for many use cases (not all).

jlc
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Chris Angelico
On Mon, Aug 18, 2014 at 1:26 AM, Stefan Behnel stefan...@behnel.de wrote:
 I actually wonder more whether Python programmers are really all that
 obsessive about CPython's GIL. Sure, there are always the Loud Guys who
 speak up when they feel like no-one's mentioned it for too long, but I'd
 expect the vast majority to be just ok with the status quo and not think
 about it most of the time. Or, well, think about it when one of the Loud
 Guys takes the megaphone, but then put their thoughts back in the attic and
 keep doing their daily work.

 Personally, I like the GIL. It helps me keep my code simpler and more
 predictable. I don't have to care about threading issues all the time and
 can otherwise freely choose the right model of parallelism that suits my
 current use case when the need arises (and threads are rarely the right
 model). I'm sure that's not just me.

The GIL doesn't prevent threads, even. It just affects when context
switches happen and what can run in parallel. As I've often said,
threads make fine sense for I/O operations; although that may start to
change - I'm sure the day will come when asyncio is the one obvious
way to do multiplexed I/O in Python.

The GIL means you can confidently write code that uses CPython's
refcounting mechanisms (whether overtly, in an extension module, or
implicitly, by just doing standard Python operations), and be
confident that internal state won't be corrupted... or, more
accurately, be confident that you're not paying a ridiculous
performance penalty (even in a single-threaded program) for the
guarantee that internal state won't be corrupted. Pike has a similar
global lock; so, I believe, does Ruby, and so do several other
languages. It's way more efficient than a lot of the alternatives. I
can't speak for Ruby, but Pike has had periodic discussions about
lessening the global lock's impact (one such way is isolating
purely-local objects from those with global references; if an object's
referenced only from the execution stack, there's no way for any other
thread to see it, ergo it's safe to work with sans locks - considering
that a lot of data manipulation will be done in this way, this could
give a lot of parallelism), and yet the GIL still exists in Python and
Pike, because it really is better than the alternatives - or at least,
insufficiently worse to justify the transitional development.

We do not have a problem here.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GIL detector

2014-08-17 Thread Chris Angelico
On Mon, Aug 18, 2014 at 2:01 AM, Joseph L. Casale
jcas...@activenetwerx.com wrote:
 If it weren't for these useless threads, you wouldn't have even been able
 to send that message, let alone do anything on a computer for that matter.

Not sure about that. I think it would be entirely possible to build a
computer that has no C threads, just processes (with separate memory
spaces) and HLL threads governed by GILs - that is, each process
cannot possibly consume more than 100% CPU time. Threads aren't
inherently required for anything, but they do make certain jobs
easier.

When I grew up with threads, multi-core home computers simply didn't
exist, so in effect the *entire computer* had a GIL. Threads still had
their uses (fast response on thread 0 makes for a responsive GUI, then
the heavy processing gets done on a spun-off thread with presumably
lower scheduling priority), and that's not changing. Requiring that
only one thread of any given process be running at a time is just a
minor limitation, and one that I would accept as part of the
restrictions of high level languages.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: GIL detector

2014-08-17 Thread Steven D'Aprano
Joseph L. Casale wrote:

 I don't have to care about threading issues all the time and
 can otherwise freely choose the right model of parallelism that suits my
 current use case when the need arises (and threads are rarely the right
 model). I'm sure that's not just me.
 
 The sound bite of a loyal Python coder:)

Who are you replying to? Please give attribution when you quote someone.


 If it weren't for these useless threads, you wouldn't have even been
 able to send that message, let alone do anything on a computer for that
 matter.

I don't see anyone except you calling threads useless. That's your word.
The person you quoted said that threads are rarely the right module,
which is not the same. (And probably a bit strong.)

However, you are factually wrong. Computers existed for decades before
lightweight threads were invented. Computers were capable of networking and
messaging decades ago, before the Internet existed, and they did it without
threads, supporting hundreds or even thousands of users at a time. They did
it with multiple processes, not threads, and today we have the same choice.

There are pros and cons to both multithreading and multiprocessing. Those
who insist that Python is completely broken because some implementations
cannot take advantage of multiple cores from threads have missed the point
that you can use a separate process for each core instead.

Ironically, using threads for email in Python is probably going to work
quite well, since it is limited by I/O and not CPU.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


User class binary ops seem too slow (was re: GIL detector)

2014-08-17 Thread Terry Reedy
In a post about CPython's GIL, Steven D'Aprano pointed to  Armin 
Ronacher's criticism of the internal type slots used for dunder methods.


 http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/

I found the following interesting.

Since we have an __add__ method the interpreter will set this up in a 
slot. So how fast is it? When we do a + b we will use the slots, so here 
is what it times it as:


$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
100 loops, best of 3: 0.256 usec per loop

If we do however a.__add__(b) we bypass the slot system. Instead the 
interpreter is looking in the instance dictionary (where it will not 
find anything) and then looks in the type's dictionary where it will 
find the method. Here is where that clocks in at:


$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)'
1000 loops, best of 3: 0.158 usec per loop

Can you believe it: the version without slots is actually faster. What 
magic is that? I'm not entirely sure what the reason for this is,


Curious myself, I repeated the result on my Win7 machine and got almost 
the same numbers.


 class A:
def __add__(self, other): return 2
 timeit.repeat('a + b', 'from __main__ import A; a=A(); b=A()')
[0.26080520927348516, 0.24120280310165754, 0.2412111032140274]
 timeit.repeat('a.__add__(b)', 'from __main__ import A; a=A(); b=A()')
[0.17656398710346366, 0.15274235713354756, 0.1528444177747872]

First I looked at the byte code.

 dis('a+b')
  1   0 LOAD_NAME0 (a)
  3 LOAD_NAME1 (b)
  6 BINARY_ADD
  7 RETURN_VALUE
 dis('a.__add__(b)')
  1   0 LOAD_NAME0 (a)
  3 LOAD_ATTR1 (__add__)
  6 LOAD_NAME2 (b)
  9 CALL_FUNCTION1 (1 positional, 0 keyword pair)
 12 RETURN_VALUE

Next the core of BINARY_ADD add code in Python/ceval.c:
if (PyUnicode_CheckExact(left) 
 PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(left, right, f, next_instr);
/* unicode_concatenate consumed the ref to v */
}
else {
sum = PyNumber_Add(left, right);

By the language definition, PyNumber_Add must whether 
issubclass(type(b), type(a)). If so, it tries b.__radd__(a). Otherwise 
it tries a.__add__(b). BINARY_ADD has extra overhead before it calls 
__add__. Enough to explain the differnce between .09 microsecond 
difference between .25 and .16 microseconds?


Lets try some builtins..

 timeit.repeat('1+1')
[0.04067762117549266, 0.019206152658126363, 0.018796680446902643]
 timeit.repeat('1.0+1.0')
[0.032686457413774406, 0.023207729064779414, 0.018793606331200863]
 timeit.repeat('(1.0+1j) + (1.0-1j)')
[0.037775348543391374, 0.01876409482042618, 0.018812358436889554]

 timeit.repeat(''+'')
[0.04073695160855095, 0.018977745861775475, 0.018800676797354754]
 timeit.repeat('a'+'b')
[0.04066932106320564, 0.01896145304840502, 0.01879268409652468]

 timeit.repeat('1 .__add__(1)')
[0.16622020259652004, 0.15244908649577837, 0.15047857833215517]
 timeit.repeat(''.__add__(''))
[0.17265801569533323, 0.1535966538865523, 0.15308880997304186]

For the common case of adding builtin numbers and empty strings, the 
binary operation is about 8 times as fast as the dict lookup and 
function call.  For empty lists, the ratio is about 3


 timeit.repeat('[]+[]')
[0.09728684696551682, 0.08233527043626054, 0.08230698857164498]
 timeit.repeat('[].__add__([])')
[0.22780949582033827, 0.206026619382, 0.2060967092206738]

Conclusions:
1. Python-level function calls to C wrappers of C functions are as slow 
as calls to Pythons functions (which I already knew to be relatively slow).


2. Taking into account the interpreters internal binary operations on 
builtin Python objects, I suspect most everyone benefits from the slot 
optimization.


3. The total BINARY_ADD + function call time for strings and number, 
about .02 microseconds is much less that the .09 difference and cannot 
account for it.


4. There might be some avoidable overhead within PyNumber_ADD that only 
affects custom-class instances (but I am done for tonight;-).


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list