On 21/03/2016 02:21, Terry Reedy wrote:
On 3/20/2016 9:15 PM, BartC wrote:
http://pastebin.com/dtM8WnFZ
This is a test of a character-at-a-time task in Python;

I disagree.  It tests of C code re-written in ludicrously crippled
Python.  No use of the re module,

You can't use the re module for this kind of test. It would be like a writing a C compiler in Python like this:

  system("gcc "+filename)

(or whatever the equivalent is in Python) and claiming the compilation speeds are due to Python's fast byte-code.

designed for tasks like this,

(I've tested someone's parser written in Python using regular expressions, I seem to remember it was still pretty slow.)

 > but exactly such tasks are what I often use dynamic languages for.

For instance, there are about 15 clauses like
---
elif c=="?":
lxsymbol=question_sym
return
---

I believe it would be much faster to combine these in one clause. First
define simple_symbols = {'?': question_sym, ...}. Then
elif c in simple_symbols:
lxsymbol = simple_symbols[c]
return


I tried that (for 11 clauses), and it actually got a bit slower if the one test was placed towards the end! But faster if placed nearer the beginning.

I also tweaked the way each identifier name is built up (using a slice after the limits of the name are established instead of building a character at a time).

The "\r" check was got rid of (in text mode, it won't occur); the eof check was last (as it will only occur once), and the chr(0) check was removed as chr(0) isn't used (that would have zero cost in a jumptable switch or using a function table, but it does cost here).

Overall, Python 3's throughput increased from 33Klps to 43Kpls (and Python 2 from 43Klps to 53Kpls).

HOWEVER: PyPy doesn't seem to like those Dict lookups: it's throughput reduced from 105Klps (after those other changes) to 29Klps when the Dict lookup was used. Odd.

(I haven't tried this on Ubuntu as that seems to have snappier versions of both Python 2 and Pypy, but that's a bit of a pain to test.)

In any case, the O(k), where k is the number of alternatives, linear
search should be replaced by an O(log k) binary search (nested if-else
statement) or O(1) hashed search (with a dictionary mapping chars to
functions.

I started off trying to write it in a more efficient way that would suit
Python better, but quickly tired of that. I should be able to express
the code how I want.

Of course you can.  But you cannot write in a crippled Python subset and
fairly claim that the result represents idiomatic Python code.

For Python I would have used a table of 0..255 functions, indexed by the ord() code of each character. So all 52 letter codes map to the same name-handling function. (No Dict is needed at this point.)

But that's a hell of a lot of infra-structure to set up, only to find out that Python's function call overheads mean it's not much faster (or maybe it's slower) than using an if-elif chain.

(I've no idea if it might be much faster or not. And yet, having said that, I can't resist trying it out! But it'll have to be a bit later.)

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to