Re: [Python-Dev] Python 3000: Special type for object attributes & map keys
Without an implementation and supporting profile data nobody is going to believe that you can do name lookup faster than with the built-in dict type in CPython. Note that names seen by the parser are already interned, so most of what you seem to be proposing is already implemented... On Wed, Mar 5, 2008 at 2:27 PM, Henrik Vendelbo <[EMAIL PROTECTED]> wrote: > It appears to me that if you can make mapping mechanisms faster in > Python you can make significant > overall speed improvements. I also think the proposed concept could > add flexibility to persistence formats > and RMI interfaces. > > My basic idea is to have a constant string type with an interpreter > globally unique hash. If the original constant > is created in a manner different from string constants, it can be > tracked and handled differently by the interpreter. > > Obviously most object attribute references are done with the dot > operator, so I guess the interpreter already has > an efficient mapping mechanism. But there must be a crossover with > __getattr__ etc, where a map of some sort is > used. I imagine that having a global namespace to translate attribute > names into integers could be used for several > purposes by the interpreter as well as an application exchanging > objects with other applications. > > I imagine these expressions to be supported: > * attrname(string) - creates an attrname value from the string > * int(attrname) - gets the hash value > * string(attrname) - gets the string value > > Hope this makes sense > > Henrik > > P.S. I originally thought of this in a JavaScript context so forgive > me if this would make little difference in Python. > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3000: Special type for object attributes & map keys
* Neal Norwitz <[EMAIL PROTECTED]> [2008-03-18 18:54:47 -0500]: > First, you should measure the current speed difference. Something like: > > $ ./python.exe -m timeit -s 'd = {1: None}' 'd[1]' > 100 loops, best of 3: 0.793 usec per loop > $ ./python.exe -m timeit -s 'd = {"1": None}' 'd["1"]' > 100 loops, best of 3: 0.728 usec per loop > > My python is a debug version, so a release version might be faster for > ints. If not, the first task would be to speed up int lookups. :-) [EMAIL PROTECTED]:~> python -V Python 2.4.5 [EMAIL PROTECTED]:~> python -m timeit -s 'd = {1: None}' 'd[1]' 1000 loops, best of 3: 0.142 usec per loop [EMAIL PROTECTED]:~> python -m timeit -s 'd = {"1": None}' 'd["1"]' 1000 loops, best of 3: 0.138 usec per loop [EMAIL PROTECTED]:~> python2.5 -V Python 2.5.2 [EMAIL PROTECTED]:~> python2.5 -m timeit -s 'd = {1: None}' 'd[1]' 1000 loops, best of 3: 0.136 usec per loop [EMAIL PROTECTED]:~> python2.5 -m timeit -s 'd = {"1": None}' 'd["1"]' 1000 loops, best of 3: 0.126 usec per loop -- mithrandi, i Ainil en-Balandor, a faer Ambar signature.asc Description: Digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3000: Special type for object attributes & map keys
Greg Ewing wrote: > Neal Norwitz wrote: >> Part of this is done, but very differently in that all strings used in >> code objects are interned (stored in a dictionary > > And since two interned strings can be compared > by pointer identity, I don't see how this differs > significantly from the "unique integer" idea. > > If the integers were used to directly index an > array instead of being used as dict keys, it > might make a difference. The cost would be that > every namespace would need to be as big as > the number of names in existence, with most > of them being extremely sparse. And we already have a data structure for sparse arrays... it's called a "dict". :) If every attribute name were guaranteed to be an interned string (not currently the case - attribute names can be any kind of object), it might be possible to add another dict specialization for interned string lookup. The wins would be that lookup could assume the string's hash is valid, and equality comparison could be done via pointer comparison. HOWEVER... I think that the attribute cache patches that went into 2.6 and 3.0 mostly take care of lookup speed issues. They both assume strings are interned already. A cache hit consists of calculating a cache index from the string hash, ensuring that the attribute name at that index is identical (via pointer comparison) to the attribute name to be looked up, and returning the associated value. Lookup with an attribute that's not a string or not interned is automatically a cache miss, and it happens very rarely. Specializing dicts for interned strings would optimize the cache miss. (When I was making the 3.0 patch, I found that happened rarely on the benchmarks and regression tests. It was somewhere around 5%.) The cache miss isn't expensive just because of the dict lookup. The attribute has to be searched for in every type and super-type of the object. The occasional string equality test probably doesn't register. I'd be happy to be shown to be wrong, though. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3000: Special type for object attributes & map keys
Neal Norwitz wrote: > Part of this is done, but very differently in that all strings used in > code objects are interned (stored in a dictionary And since two interned strings can be compared by pointer identity, I don't see how this differs significantly from the "unique integer" idea. If the integers were used to directly index an array instead of being used as dict keys, it might make a difference. The cost would be that every namespace would need to be as big as the number of names in existence, with most of them being extremely sparse. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3000: Special type for object attributes & map keys
On Wed, Mar 5, 2008 at 4:27 PM, Henrik Vendelbo <[EMAIL PROTECTED]> wrote: > It appears to me that if you can make mapping mechanisms faster in > Python you can make significant > overall speed improvements. I also think the proposed concept could > add flexibility to persistence formats > and RMI interfaces. > > My basic idea is to have a constant string type with an interpreter > globally unique hash. If the original constant > is created in a manner different from string constants, it can be > tracked and handled differently by the interpreter. Part of this is done, but very differently in that all strings used in code objects are interned (stored in a dictionary so we don't increase memory by storing multiple string objects which contain the same string) . So there is partially a mechanism to do what you suggest. But there would be many places that would need to be modified. I think we briefly discussed this in the past. > P.S. I originally thought of this in a JavaScript context so forgive > me if this would make little difference in Python. Your message was a little confusing at first because the terminology is a little different, but the idea makes sense. It's not clear how much this would speed up the interpreter. The best way to test your theory would be to create a patch and measure the performance difference. First, you should measure the current speed difference. Something like: $ ./python.exe -m timeit -s 'd = {1: None}' 'd[1]' 100 loops, best of 3: 0.793 usec per loop $ ./python.exe -m timeit -s 'd = {"1": None}' 'd["1"]' 100 loops, best of 3: 0.728 usec per loop My python is a debug version, so a release version might be faster for ints. If not, the first task would be to speed up int lookups. :-) (You should verify more with real world dict sizes.) It is possible to optimize dicts with int keys since string keys are specialized in dicts, but ints are not. You would need to look in Objects/dictobject.c. See http://python.org/dev/faq/ for general hints on how to get started. If ints were faster, some of the next steps would be: * keep the globally unique number (very easy) * update the source that generates byte codes to use the globally unique number * store ints in dicts and update all the places for how they use attributes * update byte code when a module is imported to use the globally unique number Feel free to ask questions. Cheers, n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com