On 21/03/2016 23:20, Dennis Lee Bieber wrote:
On Mon, 21 Mar 2016 17:31:21 +0000, BartC <b...@freeuk.com> declaimed the
following:

I wasn't going to post it but here it is anyway:

http://pastebin.com/FLbWSdpT

(I've added some spaces for your benefit. This also builds a histogram of names so as to do something useful. Note that despite my concerns about speed, this module can process itself in around 100ms.)


def readtoken(psource):
        global lxsptr, lxsymbol

        Why is "lxsymbol" a global, and not something returned by the function
(I can understand your making lxsptr global as you intend to come back in
with it later).

Ideally there would be a descriptor or handle passed around which contains the current state of the tokeniser, and where you stick the current token values. But for a speed test, I was worried about attribute lookups.

In the first Python version, I used 'nonlocals' (belonging to an enclosing function), but they were just as slow as globals!

        lxsubcode = 0

        Unused in the rest of the sample

This is a global. Some lxsymbol values will set it, for the rest it's neater if it's zeroed.


        while (1):

        while True:             #At least since Python 2.x... No () needed

                c=psource[lxsptr]

        Is the spacebar broken? How about some whitespace between language
elements... They don't take up that much memory

(It's not broken but it wouldn't be consistent.)

        Given that you state you expect to only be working with 8-bit bytes...

                if d<256:

this will always be true

Unfortunately Python 3 doesn't play along. There could be some Unicode characters in the string, with values above 255. (And if I used byte-sequences, I don't know what would work and what wouldn't.)


                        lxsymbol = disptable[d](psource,c)

        Looks like you are indexing a 256-element table of functions, using the
numeric value of the character/byte as the index... Only to then pass your
entire source string along with the character from it to the function.

No, it passes only a reference to the entire string. The current position is in 'lxsptr'. Yes the mix of parameters and globals is messy. All globals might be better (in the original non-Python, 'globals' would be have module-scope, and not visible outside the tokeniser module unless explicitly exported. Semi-global...).

        I have no idea what your disptable functions look like but...

        while psource:
                c, psource = psource[0], psource[1:]

I don't think this will work. Slicing creates a hard copy of the rest of the string. Performance is going to be n-squared.

(I tried a mock of this line, working with a duplicate of the data; the time to process a 600-line module doubled. I'm still waiting on the 6MB data data, and it's been seven minutes so far; it normally takes 7 seconds.

I was surprised at one time that slices don't create 'views', but I've since implemented view-slices and I can appreciate the problems.)

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to