Re: The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

BartC Mon, 21 Mar 2016 05:37:25 -0700

On 21/03/2016 02:21, Terry Reedy wrote:

On 3/20/2016 9:15 PM, BartC wrote:

http://pastebin.com/dtM8WnFZ
This is a test of a character-at-a-time task in Python;


I disagree.  It tests of C code re-written in ludicrously crippled
Python.  No use of the re module,

You can't use the re module for this kind of test. It would be like awriting a C compiler in Python like this:


  system("gcc "+filename)

(or whatever the equivalent is in Python) and claiming the compilationspeeds are due to Python's fast byte-code.

designed for tasks like this,

(I've tested someone's parser written in Python using regularexpressions, I seem to remember it was still pretty slow.)

 > but exactly such tasks are what I often use dynamic languages for.

For instance, there are about 15 clauses like
---
elif c=="?":
lxsymbol=question_sym
return
---

I believe it would be much faster to combine these in one clause. First
define simple_symbols = {'?': question_sym, ...}. Then
elif c in simple_symbols:
lxsymbol = simple_symbols[c]
return

I tried that (for 11 clauses), and it actually got a bit slower if theone test was placed towards the end! But faster if placed nearer thebeginning.

I also tweaked the way each identifier name is built up (using a sliceafter the limits of the name are established instead of building acharacter at a time).

The "\r" check was got rid of (in text mode, it won't occur); the eofcheck was last (as it will only occur once), and the chr(0) check wasremoved as chr(0) isn't used (that would have zero cost in a jumptableswitch or using a function table, but it does cost here).

Overall, Python 3's throughput increased from 33Klps to 43Kpls (andPython 2 from 43Klps to 53Kpls).

HOWEVER: PyPy doesn't seem to like those Dict lookups: it's throughputreduced from 105Klps (after those other changes) to 29Klps when the Dictlookup was used. Odd.

(I haven't tried this on Ubuntu as that seems to have snappier versionsof both Python 2 and Pypy, but that's a bit of a pain to test.)

In any case, the O(k), where k is the number of alternatives, linear
search should be replaced by an O(log k) binary search (nested if-else
statement) or O(1) hashed search (with a dictionary mapping chars to
functions.

I started off trying to write it in a more efficient way that would suit
Python better, but quickly tired of that. I should be able to express
the code how I want.


Of course you can.  But you cannot write in a crippled Python subset and
fairly claim that the result represents idiomatic Python code.

For Python I would have used a table of 0..255 functions, indexed by theord() code of each character. So all 52 letter codes map to the samename-handling function. (No Dict is needed at this point.)

But that's a hell of a lot of infra-structure to set up, only to findout that Python's function call overheads mean it's not much faster (ormaybe it's slower) than using an if-elif chain.

(I've no idea if it might be much faster or not. And yet, having saidthat, I can't resist trying it out! But it'll have to be a bit later.)


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Reply via email to