Re: User class binary ops seem too slow (was re: GIL detector)
On Mon, 18 Aug 2014 00:43:58 -0400, Terry Reedy wrote: timeit.repeat('1+1') [0.04067762117549266, 0.019206152658126363, 0.018796680446902643] I think you have been tripped up by the keyhole optimizer. I'm not entirely certain, but that's probably just measuring the overhead of evaluating the constant 2. Same applies to your other constant + constant tests. This is in Python 3.3, and is suggestive: py bc = compile('1+1', '', 'exec') py from dis import dis py dis(bc) 1 0 LOAD_CONST 2 (2) 3 POP_TOP 4 LOAD_CONST 1 (None) 7 RETURN_VALUE -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
On 2014-08-17, Stefan Behnel stefan...@behnel.de wrote: Steven D'Aprano schrieb am 17.08.2014 um 16:21: I wonder whether Ruby programmers are as obsessive about Ruby's GIL? I actually wonder more whether Python programmers are really all that obsessive about CPython's GIL. [...] Personally, I like the GIL. It helps me keep my code simpler and more predictable. I don't have to care about threading issues all the time and can otherwise freely choose the right model of parallelism that suits my current use case when the need arises (and threads are rarely the right model). I'm sure that's not just me. Those are pretty much my feelings exactly. I've been writing Python apps for 15 years. They're mostly smallish utlities, network and serial comm stuff, a few WxWidgets and GTK apps, some IMAP, SMTP and HTTP stuff. Many are multi-threaded, and some of the mesh data analysis and visualization ones ran for a few 10's of minutes. I don't remember a single instance where the GIL was even as much as annoying. The GIL means that multi-threaded apps most just work and you don't have to sprinkle mutexes all over your code the way you do in C using pthreads. You do sometimes need to use mutexs in Python, only at a higher layer -- there's a whole lower layer of mutexes that you don't need because of the GIL. -- Grant Edwards grant.b.edwardsYow! On the road, ZIPPY at is a pinhead without a gmail.compurpose, but never without a POINT. -- https://mail.python.org/mailman/listinfo/python-list
GIL detector
Coincidentally after reading Armin Ronacher's criticism of the GIL in Python: http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/ I stumbled across this GIL detector script: http://yuvalg.com/blog/2011/08/09/the-gil-detector/ Running it on a couple of my systems, I get these figures: CPython 2.7: 0.8/2 cores CPython 3.3: 1.0/2 cores Jython 2.5: 2.3/4 cores CPython 2.6: 0.7/4 cores CPython 3.3: 0.7/4 cores With IronPython, the script raise an exception. The day will come that even the cheapest, meanest entry-level PC will come standard with 8 cores and the GIL will just be an embarrassment, but today is not that day. I wonder whether Ruby programmers are as obsessive about Ruby's GIL? -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
On 17.08.2014 16:21, Steven D'Aprano wrote: Coincidentally after reading Armin Ronacher's criticism of the GIL in Python: http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/ Sure that's the right one? The article you linked doesn't mention the GIL. I stumbled across this GIL detector script: http://yuvalg.com/blog/2011/08/09/the-gil-detector/ Running it on a couple of my systems, I get these figures: CPython 2.7: 0.8/2 cores CPython 3.3: 1.0/2 cores Jython 2.5: 2.3/4 cores CPython 2.6: 0.7/4 cores CPython 3.3: 0.7/4 cores CPython 3.4: 0.9/4 cores Cheers, Johannes -- Wo hattest Du das Beben nochmal GENAU vorhergesagt? Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa hidbv3$om2$1...@speranza.aioe.org -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
On Mon, Aug 18, 2014 at 12:21 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: The day will come that even the cheapest, meanest entry-level PC will come standard with 8 cores and the GIL will just be an embarrassment, but today is not that day. I wonder whether Ruby programmers are as obsessive about Ruby's GIL? I'm kinda waiting till I see tons of awesome asyncio code in the wild, but the way I'm seeing things, the world seems to be moving toward a model along these lines: 0) Processes get spawned for any sort of security/protection boundary. Sandboxing Python-in-Python (or any other high level language) just isn't worth the effort. 1) One process, in any high level language, multiplexes requests but uses just one CPU core. 2) Something at a higher level dispatches requests between multiple processes - eg Passenger with Apache. So, if you want to take advantage of your eight cores, you run eight processes, and have Apache spread the load between them. Each process might handle a large number of concurrent requests, but all through async I/O and a single dispatch loop. Even the use of multiple threads seems to be dying out (despite being quite handy when lower-level functions will release the GIL) in favour of the multiple process model. I'm just not sure how, with that kind of model, to have processes interact with each other. It's fine when every request is handled perfectly independently, but what if you want per-user state, and you can't guarantee that one user's requests will all come to the same process? You have to push everything through a single serialized storage vault (probably a database), which is then going to become the bottleneck. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
Johannes Bauer wrote: On 17.08.2014 16:21, Steven D'Aprano wrote: Coincidentally after reading Armin Ronacher's criticism of the GIL in Python: http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/ Sure that's the right one? The article you linked doesn't mention the GIL. Search for global interpreter lock, there are at least three references to it. Okay, the post is not *specifically* criticism of the GIL, but of the design decision which makes the GIL necessary. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
Steven D'Aprano schrieb am 17.08.2014 um 16:21: I wonder whether Ruby programmers are as obsessive about Ruby's GIL? I actually wonder more whether Python programmers are really all that obsessive about CPython's GIL. Sure, there are always the Loud Guys who speak up when they feel like no-one's mentioned it for too long, but I'd expect the vast majority to be just ok with the status quo and not think about it most of the time. Or, well, think about it when one of the Loud Guys takes the megaphone, but then put their thoughts back in the attic and keep doing their daily work. Personally, I like the GIL. It helps me keep my code simpler and more predictable. I don't have to care about threading issues all the time and can otherwise freely choose the right model of parallelism that suits my current use case when the need arises (and threads are rarely the right model). I'm sure that's not just me. Stefan -- https://mail.python.org/mailman/listinfo/python-list
RE: GIL detector
I don't have to care about threading issues all the time and can otherwise freely choose the right model of parallelism that suits my current use case when the need arises (and threads are rarely the right model). I'm sure that's not just me. The sound bite of a loyal Python coder:) If it weren't for these useless threads, you wouldn't have even been able to send that message, let alone do anything on a computer for that matter. That generalization is a bit broad, the argument is pointless when it moves away from a purely technical to an emotionally basis. A failure to separate the two pigeonholes the progress of the language and hides underlying constraints which is what Armins post was trying to bring forward. It's naïve to think the first crack of anything is immune to refactoring. No one said Python doesn't have any merit and I honestly don't know a perfect language that has no need for improvement somewhere. I am sure however that the topic is a concern for many use cases (not all). jlc -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
On Mon, Aug 18, 2014 at 1:26 AM, Stefan Behnel stefan...@behnel.de wrote: I actually wonder more whether Python programmers are really all that obsessive about CPython's GIL. Sure, there are always the Loud Guys who speak up when they feel like no-one's mentioned it for too long, but I'd expect the vast majority to be just ok with the status quo and not think about it most of the time. Or, well, think about it when one of the Loud Guys takes the megaphone, but then put their thoughts back in the attic and keep doing their daily work. Personally, I like the GIL. It helps me keep my code simpler and more predictable. I don't have to care about threading issues all the time and can otherwise freely choose the right model of parallelism that suits my current use case when the need arises (and threads are rarely the right model). I'm sure that's not just me. The GIL doesn't prevent threads, even. It just affects when context switches happen and what can run in parallel. As I've often said, threads make fine sense for I/O operations; although that may start to change - I'm sure the day will come when asyncio is the one obvious way to do multiplexed I/O in Python. The GIL means you can confidently write code that uses CPython's refcounting mechanisms (whether overtly, in an extension module, or implicitly, by just doing standard Python operations), and be confident that internal state won't be corrupted... or, more accurately, be confident that you're not paying a ridiculous performance penalty (even in a single-threaded program) for the guarantee that internal state won't be corrupted. Pike has a similar global lock; so, I believe, does Ruby, and so do several other languages. It's way more efficient than a lot of the alternatives. I can't speak for Ruby, but Pike has had periodic discussions about lessening the global lock's impact (one such way is isolating purely-local objects from those with global references; if an object's referenced only from the execution stack, there's no way for any other thread to see it, ergo it's safe to work with sans locks - considering that a lot of data manipulation will be done in this way, this could give a lot of parallelism), and yet the GIL still exists in Python and Pike, because it really is better than the alternatives - or at least, insufficiently worse to justify the transitional development. We do not have a problem here. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: GIL detector
On Mon, Aug 18, 2014 at 2:01 AM, Joseph L. Casale jcas...@activenetwerx.com wrote: If it weren't for these useless threads, you wouldn't have even been able to send that message, let alone do anything on a computer for that matter. Not sure about that. I think it would be entirely possible to build a computer that has no C threads, just processes (with separate memory spaces) and HLL threads governed by GILs - that is, each process cannot possibly consume more than 100% CPU time. Threads aren't inherently required for anything, but they do make certain jobs easier. When I grew up with threads, multi-core home computers simply didn't exist, so in effect the *entire computer* had a GIL. Threads still had their uses (fast response on thread 0 makes for a responsive GUI, then the heavy processing gets done on a spun-off thread with presumably lower scheduling priority), and that's not changing. Requiring that only one thread of any given process be running at a time is just a minor limitation, and one that I would accept as part of the restrictions of high level languages. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
RE: GIL detector
Joseph L. Casale wrote: I don't have to care about threading issues all the time and can otherwise freely choose the right model of parallelism that suits my current use case when the need arises (and threads are rarely the right model). I'm sure that's not just me. The sound bite of a loyal Python coder:) Who are you replying to? Please give attribution when you quote someone. If it weren't for these useless threads, you wouldn't have even been able to send that message, let alone do anything on a computer for that matter. I don't see anyone except you calling threads useless. That's your word. The person you quoted said that threads are rarely the right module, which is not the same. (And probably a bit strong.) However, you are factually wrong. Computers existed for decades before lightweight threads were invented. Computers were capable of networking and messaging decades ago, before the Internet existed, and they did it without threads, supporting hundreds or even thousands of users at a time. They did it with multiple processes, not threads, and today we have the same choice. There are pros and cons to both multithreading and multiprocessing. Those who insist that Python is completely broken because some implementations cannot take advantage of multiple cores from threads have missed the point that you can use a separate process for each core instead. Ironically, using threads for email in Python is probably going to work quite well, since it is limited by I/O and not CPU. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
User class binary ops seem too slow (was re: GIL detector)
In a post about CPython's GIL, Steven D'Aprano pointed to Armin Ronacher's criticism of the internal type slots used for dunder methods. http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/ I found the following interesting. Since we have an __add__ method the interpreter will set this up in a slot. So how fast is it? When we do a + b we will use the slots, so here is what it times it as: $ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b' 100 loops, best of 3: 0.256 usec per loop If we do however a.__add__(b) we bypass the slot system. Instead the interpreter is looking in the instance dictionary (where it will not find anything) and then looks in the type's dictionary where it will find the method. Here is where that clocks in at: $ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)' 1000 loops, best of 3: 0.158 usec per loop Can you believe it: the version without slots is actually faster. What magic is that? I'm not entirely sure what the reason for this is, Curious myself, I repeated the result on my Win7 machine and got almost the same numbers. class A: def __add__(self, other): return 2 timeit.repeat('a + b', 'from __main__ import A; a=A(); b=A()') [0.26080520927348516, 0.24120280310165754, 0.2412111032140274] timeit.repeat('a.__add__(b)', 'from __main__ import A; a=A(); b=A()') [0.17656398710346366, 0.15274235713354756, 0.1528444177747872] First I looked at the byte code. dis('a+b') 1 0 LOAD_NAME0 (a) 3 LOAD_NAME1 (b) 6 BINARY_ADD 7 RETURN_VALUE dis('a.__add__(b)') 1 0 LOAD_NAME0 (a) 3 LOAD_ATTR1 (__add__) 6 LOAD_NAME2 (b) 9 CALL_FUNCTION1 (1 positional, 0 keyword pair) 12 RETURN_VALUE Next the core of BINARY_ADD add code in Python/ceval.c: if (PyUnicode_CheckExact(left) PyUnicode_CheckExact(right)) { sum = unicode_concatenate(left, right, f, next_instr); /* unicode_concatenate consumed the ref to v */ } else { sum = PyNumber_Add(left, right); By the language definition, PyNumber_Add must whether issubclass(type(b), type(a)). If so, it tries b.__radd__(a). Otherwise it tries a.__add__(b). BINARY_ADD has extra overhead before it calls __add__. Enough to explain the differnce between .09 microsecond difference between .25 and .16 microseconds? Lets try some builtins.. timeit.repeat('1+1') [0.04067762117549266, 0.019206152658126363, 0.018796680446902643] timeit.repeat('1.0+1.0') [0.032686457413774406, 0.023207729064779414, 0.018793606331200863] timeit.repeat('(1.0+1j) + (1.0-1j)') [0.037775348543391374, 0.01876409482042618, 0.018812358436889554] timeit.repeat(''+'') [0.04073695160855095, 0.018977745861775475, 0.018800676797354754] timeit.repeat('a'+'b') [0.04066932106320564, 0.01896145304840502, 0.01879268409652468] timeit.repeat('1 .__add__(1)') [0.16622020259652004, 0.15244908649577837, 0.15047857833215517] timeit.repeat(''.__add__('')) [0.17265801569533323, 0.1535966538865523, 0.15308880997304186] For the common case of adding builtin numbers and empty strings, the binary operation is about 8 times as fast as the dict lookup and function call. For empty lists, the ratio is about 3 timeit.repeat('[]+[]') [0.09728684696551682, 0.08233527043626054, 0.08230698857164498] timeit.repeat('[].__add__([])') [0.22780949582033827, 0.206026619382, 0.2060967092206738] Conclusions: 1. Python-level function calls to C wrappers of C functions are as slow as calls to Pythons functions (which I already knew to be relatively slow). 2. Taking into account the interpreters internal binary operations on builtin Python objects, I suspect most everyone benefits from the slot optimization. 3. The total BINARY_ADD + function call time for strings and number, about .02 microseconds is much less that the .09 difference and cannot account for it. 4. There might be some avoidable overhead within PyNumber_ADD that only affects custom-class instances (but I am done for tonight;-). -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list