On Tuesday 22 March 2016 11:49, BartC wrote:

[...]
> Ideally there would be a descriptor or handle passed around which
> contains the current state of the tokeniser, and where you stick the
> current token values. But for a speed test, I was worried about
> attribute lookups.

Bart, in my experience, it is very difficult to predict where the 
bottlenecks are in Python without many, many years of experience, and next 
to impossible if you are not familiar with Python. Your intuitions about 
what's fast and what's slow, honed from other languages, are likely to be 
very wrong when in comes to Python.

I applaud you writing different versions of code to try different tactics, 
but you should start from "write the most natural Python code you can" 
*before* you trying guessing what's fast and what's slow.


[...]
>> Looks like you are indexing a 256-element table of functions, using the
>> numeric value of the character/byte as the index... Only to then pass
>> your entire source string along with the character from it to the
>> function.
> 
> No, it passes only a reference to the entire string.

In Python terms, that *is* "the entire string".

The point is, under the hood of the interpreter, *everything* passes along 
references. We rarely draw attention to this fact. Given:

function(1.5, [])

we say "pass the float 1.5 and an empty list to function", not "pass a 
reference to the float 1.5 and a reference to an empty list to a reference 
to the function", because that would get rather annoying quickly. But of 
course the experienced developer would be aware that this is going on behind 
the scenes.

But that's the critical point: it is *behind the scenes*, an implementation 
detail. In Python code, you don't pass "a reference to" 1.5, you pass 1.5. 
Likewise for all other objects. The abstraction is that you pass objects 
around, not references to objects.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to