Re: [Python-Dev] Python 3000: Special type for object attributes & map keys

2008-04-07 Thread Guido van Rossum
Without an implementation and supporting profile data nobody is going
to believe that you can do name lookup faster than with the built-in
dict type in CPython. Note that names seen by the parser are already
interned, so most of what you seem to be proposing is already
implemented...

On Wed, Mar 5, 2008 at 2:27 PM, Henrik Vendelbo <[EMAIL PROTECTED]> wrote:
> It appears to me that if you can make mapping mechanisms faster in
>  Python you can make significant
>  overall speed improvements. I also think the proposed concept could
>  add flexibility to persistence formats
>  and RMI interfaces.
>
>  My basic idea is to have a constant string type with an interpreter
>  globally unique hash. If the original constant
>  is created in a manner different from string constants, it can be
>  tracked and handled differently by the interpreter.
>
>  Obviously most object attribute references are done with the dot
>  operator, so I guess the interpreter already has
>  an efficient mapping mechanism. But there must be a crossover with
>  __getattr__ etc, where a map of some sort is
>  used. I imagine that having a global namespace to translate attribute
>  names into integers could be used for several
>  purposes by the interpreter as well as an application exchanging
>  objects with other applications.
>
>  I imagine these expressions to be supported:
>  * attrname(string) - creates an attrname value from the string
>  * int(attrname) - gets the hash value
>  * string(attrname) - gets the string value
>
>  Hope this makes sense
>
>  Henrik
>
>  P.S. I originally thought of this in a JavaScript context so forgive
>  me if this would make little difference in Python.
>  ___
>  Python-Dev mailing list
>  Python-Dev@python.org
>  http://mail.python.org/mailman/listinfo/python-dev
>  Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3000: Special type for object attributes & map keys

2008-03-19 Thread Tristan Seligmann
* Neal Norwitz <[EMAIL PROTECTED]> [2008-03-18 18:54:47 -0500]:

> First, you should measure the current speed difference.  Something like:
> 
> $ ./python.exe -m timeit -s 'd = {1: None}' 'd[1]'
> 100 loops, best of 3: 0.793 usec per loop
> $ ./python.exe -m timeit -s 'd = {"1": None}' 'd["1"]'
> 100 loops, best of 3: 0.728 usec per loop
> 
> My python is a debug version, so a release version might be faster for
> ints.  If not, the first task would be to speed up int lookups. :-)

[EMAIL PROTECTED]:~> python -V
Python 2.4.5
[EMAIL PROTECTED]:~> python -m timeit -s 'd = {1: None}' 'd[1]'
1000 loops, best of 3: 0.142 usec per loop
[EMAIL PROTECTED]:~> python -m timeit -s 'd = {"1": None}' 'd["1"]'
1000 loops, best of 3: 0.138 usec per loop

[EMAIL PROTECTED]:~> python2.5 -V
Python 2.5.2
[EMAIL PROTECTED]:~> python2.5 -m timeit -s 'd = {1: None}' 'd[1]'
1000 loops, best of 3: 0.136 usec per loop
[EMAIL PROTECTED]:~> python2.5 -m timeit -s 'd = {"1": None}' 'd["1"]'
1000 loops, best of 3: 0.126 usec per loop
-- 
mithrandi, i Ainil en-Balandor, a faer Ambar


signature.asc
Description: Digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3000: Special type for object attributes & map keys

2008-03-19 Thread Neil Toronto
Greg Ewing wrote:
> Neal Norwitz wrote:
>> Part of this is done, but very differently in that all strings used in
>> code objects are interned (stored in a dictionary
> 
> And since two interned strings can be compared
> by pointer identity, I don't see how this differs
> significantly from the "unique integer" idea.
> 
> If the integers were used to directly index an
> array instead of being used as dict keys, it
> might make a difference. The cost would be that
> every namespace would need to be as big as
> the number of names in existence, with most
> of them being extremely sparse.

And we already have a data structure for sparse arrays... it's called a 
"dict". :)

If every attribute name were guaranteed to be an interned string (not 
currently the case - attribute names can be any kind of object), it 
might be possible to add another dict specialization for interned string 
lookup. The wins would be that lookup could assume the string's hash is 
valid, and equality comparison could be done via pointer comparison. 
HOWEVER...

I think that the attribute cache patches that went into 2.6 and 3.0 
mostly take care of lookup speed issues. They both assume strings are 
interned already. A cache hit consists of calculating a cache index from 
the string hash, ensuring that the attribute name at that index is 
identical (via pointer comparison) to the attribute name to be looked 
up, and returning the associated value. Lookup with an attribute that's 
not a string or not interned is automatically a cache miss, and it 
happens very rarely.

Specializing dicts for interned strings would optimize the cache miss. 
(When I was making the 3.0 patch, I found that happened rarely on the 
benchmarks and regression tests. It was somewhere around 5%.) The cache 
miss isn't expensive just because of the dict lookup. The attribute has 
to be searched for in every type and super-type of the object. The 
occasional string equality test probably doesn't register.

I'd be happy to be shown to be wrong, though.

Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3000: Special type for object attributes & map keys

2008-03-18 Thread Greg Ewing
Neal Norwitz wrote:
> Part of this is done, but very differently in that all strings used in
> code objects are interned (stored in a dictionary

And since two interned strings can be compared
by pointer identity, I don't see how this differs
significantly from the "unique integer" idea.

If the integers were used to directly index an
array instead of being used as dict keys, it
might make a difference. The cost would be that
every namespace would need to be as big as
the number of names in existence, with most
of them being extremely sparse.

-- 
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3000: Special type for object attributes & map keys

2008-03-18 Thread Neal Norwitz
On Wed, Mar 5, 2008 at 4:27 PM, Henrik Vendelbo <[EMAIL PROTECTED]> wrote:
> It appears to me that if you can make mapping mechanisms faster in
>  Python you can make significant
>  overall speed improvements. I also think the proposed concept could
>  add flexibility to persistence formats
>  and RMI interfaces.
>
>  My basic idea is to have a constant string type with an interpreter
>  globally unique hash. If the original constant
>  is created in a manner different from string constants, it can be
>  tracked and handled differently by the interpreter.

Part of this is done, but very differently in that all strings used in
code objects are interned (stored in a dictionary so we don't increase
memory by storing multiple string objects which contain the same
string) .  So there is partially a mechanism to do what you suggest.
But there would be many places that would need to be modified.

I think we briefly discussed this in the past.

>  P.S. I originally thought of this in a JavaScript context so forgive
>  me if this would make little difference in Python.

Your message was a little confusing at first because the terminology
is a little different, but the idea makes sense.  It's not clear how
much this would speed up the interpreter.  The best way to test your
theory would be to create a patch and measure the performance
difference.

First, you should measure the current speed difference.  Something like:

$ ./python.exe -m timeit -s 'd = {1: None}' 'd[1]'
100 loops, best of 3: 0.793 usec per loop
$ ./python.exe -m timeit -s 'd = {"1": None}' 'd["1"]'
100 loops, best of 3: 0.728 usec per loop

My python is a debug version, so a release version might be faster for
ints.  If not, the first task would be to speed up int lookups. :-)
(You should verify more with real world dict sizes.)  It is possible
to optimize dicts with int keys since string keys are specialized in
dicts, but ints are not.  You would need to look in
Objects/dictobject.c.  See http://python.org/dev/faq/ for general
hints on how to get started.

If ints were faster, some of the next steps would be:
 * keep the globally unique number (very easy)
 * update the source that generates byte codes to use the globally unique number
 * store ints in dicts and update all the places for how they use attributes
 * update byte code when a module is imported to use the globally unique number

Feel free to ask questions.

Cheers,
n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 3000: Special type for object attributes & map keys

2008-03-18 Thread Henrik Vendelbo
It appears to me that if you can make mapping mechanisms faster in  
Python you can make significant
overall speed improvements. I also think the proposed concept could  
add flexibility to persistence formats
and RMI interfaces.

My basic idea is to have a constant string type with an interpreter  
globally unique hash. If the original constant
is created in a manner different from string constants, it can be  
tracked and handled differently by the interpreter.

Obviously most object attribute references are done with the dot  
operator, so I guess the interpreter already has
an efficient mapping mechanism. But there must be a crossover with  
__getattr__ etc, where a map of some sort is
used. I imagine that having a global namespace to translate attribute  
names into integers could be used for several
purposes by the interpreter as well as an application exchanging  
objects with other applications.

I imagine these expressions to be supported:
* attrname(string) - creates an attrname value from the string
* int(attrname) - gets the hash value
* string(attrname) - gets the string value

Hope this makes sense

Henrik

P.S. I originally thought of this in a JavaScript context so forgive  
me if this would make little difference in Python.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com