Re: [Python-Dev] Making builtins more efficient

2007-04-14 Thread Steven Elliott
;`On Thu, 2007-02-22 at 01:26 +0100, Giovanni Bajo wrote: 
 On 20/02/2007 16.07, Steven Elliott wrote:
 
  I'm finally getting back into this.  I'd like to take one more shot at
  it with a revised version of what I proposed before.  
  
  For those of you that did not see the original thread it was about ways
  that accessing builtins could be more efficient.  It's a bit much to
  summarize again now, but you should be able to find it in the archive
  with this subject and a date of 2006-03-08.  
 
 Are you aware of this patch, which is still awaiting review?
 https://sourceforge.net/tracker/?func=detailatid=305470aid=1616125group_id=5470

I was not aware of your patch.  I've since downloaded it, applied it,
and played with it a bit.

I find the cached module lookups (cached lookups when loading attributes
in modules via LOAD_ATTR) to be particularly interesting since it
addresses a case where PEP 280 leaves off.  

Your idea is to have an indexable array of objects that is only used
when the hash table has not been changed, which can be determined by the
timestamps you added.  That may be the best way of handling attributes
in modules (LOAD_ATTR).  For global variables (LOAD_GLOBAL) I'm curious
how it compares to PEP 280 and or Greg Ewing's idea.

-- 
---
|  Steven Elliott  |  [EMAIL PROTECTED] |
---


___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2007-02-21 Thread Steven Elliott
On Tue, 2007-02-20 at 07:48 -0800, Guido van Rossum wrote:
 If this is not a replay of an old message, please move the discussion
 to python-ideas.

It's a modified version of an old idea, so I wasn't sure where to post
it since previously it was discussed here.  I'll look into python-ideas.

-- 
---
|  Steven Elliott  |  [EMAIL PROTECTED] |
---


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2007-02-21 Thread Giovanni Bajo
On 20/02/2007 16.07, Steven Elliott wrote:

 I'm finally getting back into this.  I'd like to take one more shot at
 it with a revised version of what I proposed before.  
 
 For those of you that did not see the original thread it was about ways
 that accessing builtins could be more efficient.  It's a bit much to
 summarize again now, but you should be able to find it in the archive
 with this subject and a date of 2006-03-08.  

Are you aware of this patch, which is still awaiting review?
https://sourceforge.net/tracker/?func=detailatid=305470aid=1616125group_id=5470
-- 
Giovanni Bajo

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2007-02-20 Thread Steven Elliott
I'm finally getting back into this.  I'd like to take one more shot at
it with a revised version of what I proposed before.  

For those of you that did not see the original thread it was about ways
that accessing builtins could be more efficient.  It's a bit much to
summarize again now, but you should be able to find it in the archive
with this subject and a date of 2006-03-08.  

On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: 
 Steven Elliott wrote:
  One way of handling it is to
  alter STORE_ATTR (op code for assigning to mod.str) to always check to
  see if the key being assigned is one of the default builtins.  If it is,
  then the module's indexed array of builtins is assigned to.
 
 As long as you're going to all that trouble, it
 doesn't seem like it would be much harder to treat
 all global names that way, instead of just a predefined
 set. The compiler already knows all of the names that
 are used as globals in the module's code.

What I have in mind may be close to what you are suggesting above.  My
thought now is that builtins are a set of tokens that typically, but
don't necessarily, point to the same objects in all modules.  Such
tokens, which I'll refer to as global tokens, can be roughly broken
into two sets:
1) Global tokens that typically point to the same object in all
modules.
2) Global tokens that that are likely to point to the different
objects (or be undefined) in different modules.
Set 1) is pretty much the the builtins.  True and len are likely to
point to the same objects in all modules, but not necessarily.  Set 2)
might be things like os and sys which are often defined (imported)
in modules, but not necessarily.

Access to the globals of a module, including the current module, is done
with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME).  For
each of these opcodes the following snippet of code from ceval.c (for
LOAD_GLOBAL) is relevant to this discussion:
/* This is the un-inlined version of the code above */
x = PyDict_GetItem(f-f_globals, w);
if (x == NULL) {
x = PyDict_GetItem(f-f_builtins, w);
if (x == NULL) {
  load_global_error:
format_exc_check_arg(
PyExc_NameError,
GLOBAL_NAME_ERROR_MSG, w);
break;
}
}

So, to avoid the hash table lookups above maybe the global tokens could
be assigned an index value that is fixed for any given version of the
interpreter and that is the same for all modules (that True is always
index 7, len is always index 3, etc.)

Once a set of indexes have been determined a new opcode, that I'll call
LOAD_GTOKEN, could be created that avoids the hash table lookup by
functioning in a way that is similar to LOAD_FAST (pull a local variable
value out of an array).  For example, static references to True could
always be compiled to
  LOAD_GTOKEN 7 (True)

As to set 1) and set 2) that I mentioned above - there is only a need to
distinguish between the two sets if a copy-on-write mechanism is used.
That way global tokens that are likely to have their value changed
(group 2) ) can all be together in one group so that only that group
needs to be copied when one of the global tokens is written to.  For
example code such as:
True = 1
print True
would be compiled into something like:
  1   LOAD_CONST  1 (1)
  STORE_GTOKEN1   7 (True)
  2   LOAD_GTOKEN17 (True)
  PRINT_ITEM
  PRINT_NEWLINE
Note that 1 has been appended to STORE_GTOKEN to indicate that group
1) is being worked with.  The store command will copy the array of
pointers once, the first time it is called.

Just as a new opcode is needed for LOAD_GLOBAL one would be needed for
LOAD_ATTR.  Perhaps LOAD_ATOKEN would work.  For example:
amodule.len = my_len
print amodule.len
would be compiled into something like:
  1   LOAD_GLOBAL 0 (my_len)
  LOAD_GLOBAL 1 (amodule)
  STORE_ATOKEN1   3 (len)

  2   LOAD_GLOBAL 1 (amodule)
  LOAD_ATOKEN13 (len)
  PRINT_ITEM
  PRINT_NEWLINE
  LOAD_CONST  0 (None)
  RETURN_VALUE

Note that it looks almost identical to the code that is currently
generated, but the oparg 3 shown for the LOAD_ATOKEN1 above indexes
into an array (like LOAD_FAST) to get at the attribute directly whereas
the oparg that would be shown for LOAD_ATTR is an index into an array of
constants/strings which is then used to retrieve the attribute from the
module's global hash table.

  That's great, but I'm curious if additional gains can be
  made be focusing just on builtins.
 
 As long as builtins can be shadowed, I can't see how
 to make any extra use of the fact that it's a builtin.
 A semantic change would be needed, such as forbidding
 shadowing of builtins, or at least forbidding this
 from outside the module.

I now think that it best not to think of builtins as being a special
case.  What really matters is that 

Re: [Python-Dev] Making builtins more efficient

2007-02-20 Thread Guido van Rossum
If this is not a replay of an old message, please move the discussion
to python-ideas.

On 2/20/07, Steven Elliott [EMAIL PROTECTED] wrote:
 I'm finally getting back into this.  I'd like to take one more shot at
 it with a revised version of what I proposed before.

 For those of you that did not see the original thread it was about ways
 that accessing builtins could be more efficient.  It's a bit much to
 summarize again now, but you should be able to find it in the archive
 with this subject and a date of 2006-03-08.

 On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote:
  Steven Elliott wrote:
   One way of handling it is to
   alter STORE_ATTR (op code for assigning to mod.str) to always check to
   see if the key being assigned is one of the default builtins.  If it is,
   then the module's indexed array of builtins is assigned to.
 
  As long as you're going to all that trouble, it
  doesn't seem like it would be much harder to treat
  all global names that way, instead of just a predefined
  set. The compiler already knows all of the names that
  are used as globals in the module's code.

 What I have in mind may be close to what you are suggesting above.  My
 thought now is that builtins are a set of tokens that typically, but
 don't necessarily, point to the same objects in all modules.  Such
 tokens, which I'll refer to as global tokens, can be roughly broken
 into two sets:
 1) Global tokens that typically point to the same object in all
 modules.
 2) Global tokens that that are likely to point to the different
 objects (or be undefined) in different modules.
 Set 1) is pretty much the the builtins.  True and len are likely to
 point to the same objects in all modules, but not necessarily.  Set 2)
 might be things like os and sys which are often defined (imported)
 in modules, but not necessarily.

 Access to the globals of a module, including the current module, is done
 with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME).  For
 each of these opcodes the following snippet of code from ceval.c (for
 LOAD_GLOBAL) is relevant to this discussion:
 /* This is the un-inlined version of the code above */
 x = PyDict_GetItem(f-f_globals, w);
 if (x == NULL) {
 x = PyDict_GetItem(f-f_builtins, w);
 if (x == NULL) {
   load_global_error:
 format_exc_check_arg(
 PyExc_NameError,
 GLOBAL_NAME_ERROR_MSG, w);
 break;
 }
 }

 So, to avoid the hash table lookups above maybe the global tokens could
 be assigned an index value that is fixed for any given version of the
 interpreter and that is the same for all modules (that True is always
 index 7, len is always index 3, etc.)

 Once a set of indexes have been determined a new opcode, that I'll call
 LOAD_GTOKEN, could be created that avoids the hash table lookup by
 functioning in a way that is similar to LOAD_FAST (pull a local variable
 value out of an array).  For example, static references to True could
 always be compiled to
   LOAD_GTOKEN 7 (True)

 As to set 1) and set 2) that I mentioned above - there is only a need to
 distinguish between the two sets if a copy-on-write mechanism is used.
 That way global tokens that are likely to have their value changed
 (group 2) ) can all be together in one group so that only that group
 needs to be copied when one of the global tokens is written to.  For
 example code such as:
 True = 1
 print True
 would be compiled into something like:
   1   LOAD_CONST  1 (1)
   STORE_GTOKEN1   7 (True)
   2   LOAD_GTOKEN17 (True)
   PRINT_ITEM
   PRINT_NEWLINE
 Note that 1 has been appended to STORE_GTOKEN to indicate that group
 1) is being worked with.  The store command will copy the array of
 pointers once, the first time it is called.

 Just as a new opcode is needed for LOAD_GLOBAL one would be needed for
 LOAD_ATTR.  Perhaps LOAD_ATOKEN would work.  For example:
 amodule.len = my_len
 print amodule.len
 would be compiled into something like:
   1   LOAD_GLOBAL 0 (my_len)
   LOAD_GLOBAL 1 (amodule)
   STORE_ATOKEN1   3 (len)

   2   LOAD_GLOBAL 1 (amodule)
   LOAD_ATOKEN13 (len)
   PRINT_ITEM
   PRINT_NEWLINE
   LOAD_CONST  0 (None)
   RETURN_VALUE

 Note that it looks almost identical to the code that is currently
 generated, but the oparg 3 shown for the LOAD_ATOKEN1 above indexes
 into an array (like LOAD_FAST) to get at the attribute directly whereas
 the oparg that would be shown for LOAD_ATTR is an index into an array of
 constants/strings which is then used to retrieve the attribute from the
 module's global hash table.

   That's great, but I'm curious if additional gains can be
   made be focusing just on builtins.
 
  As long as builtins can be shadowed, I can't see how
  to make any extra use of the fact that it's a builtin.
  A 

Re: [Python-Dev] Making builtins more efficient

2006-03-14 Thread Steven Elliott
On Thu, 2006-03-09 at 08:51 -0800, Raymond Hettinger wrote:
 [Steven Elliott]
  As you probably know each access of a builtin requires two hash table
  lookups.  First, the builtin is not found in the list of globals.  It is
  then found in the list of builtins.
 
 If someone really cared about the double lookup, they could flatten a level 
 by 
 starting their modules with:
 
from __builtin__ import *
 
 However, we don't see people writing this kind of code.  That could mean that 
 the double lookup hasn't been a big concern.

It could mean that.  I think what you are suggesting is sufficiently
cleaver that the average Python coder may not have thought of it.

In any case, many people are willing to do while 1 instead of while
True to avoid the double lookup.  And the from __builtin__ import *
additionally imposes a startup cost and memory cost (at least a word per
builtin, I would guess).

  Why not have a means of referencing the default builtins with some sort
  of index the way the LOAD_FAST op code currently works?
 
 FWIW, there was a PEP proposing a roughly similar idea, but the PEP 
 encountered 
 a great deal of resistance:
 
   http://www.python.org/doc/peps/pep-0329/
 
 When it comes time to write your PEP, it would helpful to highlight how it 
 differs from PEP 329 (i.e. implemented through the compiler rather than as a 
 bytecode hack, etc.).

I'm flattered that you think it might be worthy of a PEP.  I'll look
into doing that.

  Perhaps what I'm suggesting isn't feasible for reasons that have already
  been discussed.  But it seems like it should be possible to make while
  True as efficient as while 1.
 
 That is going to be difficult as long as it is legal to write:
 
 True = 0

LOAD_BUILTIN (or whatever we want to call it) should be as fast as
LOAD_FAST (locals) or LOAD_CONST in that they each index into an
array where the index is the argument to the opcode.  

I'll look into writing a PEP.

-- 
---
|  Steven Elliott  |  [EMAIL PROTECTED] |
---


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-13 Thread Samuele Pedroni
Phillip J. Eby wrote:
 At 02:47 PM 3/13/2006 -0500, Jim Jewett wrote:
 
Paul Moore wrote:


Is there any practical way of detecting and flagging
constructs like the above (remotely shadowing a
builtin in another module)?

Phillip J. Eby wrote:

the patch ended up being backed out ... too strict
of a check to be accepted for Python 2.4.

http://svn.python.org/view/python/trunk/Objects/moduleobject.c

It was revision 33054, backed out in 33084.

The patch warned about any shadowing of builtins, which
probably is too strict.
 
 
 I'm curious: what actually happened?  What things were causing warnings?

http://mail.python.org/pipermail/python-dev/2003-July/036879.html


 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/pedronis%40strakt.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-13 Thread Phillip J. Eby
At 02:47 PM 3/13/2006 -0500, Jim Jewett wrote:
Paul Moore wrote:

  Is there any practical way of detecting and flagging
  constructs like the above (remotely shadowing a
  builtin in another module)?

Phillip J. Eby wrote:
  the patch ended up being backed out ... too strict
  of a check to be accepted for Python 2.4.

http://svn.python.org/view/python/trunk/Objects/moduleobject.c

It was revision 33054, backed out in 33084.

The patch warned about any shadowing of builtins, which
probably is too strict.

I'm curious: what actually happened?  What things were causing warnings?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-12 Thread Greg Ewing
Guido van Rossum wrote:

 I don't see why everything that doesn't make sense to be shadowed
 ought to become a keyword.

That wasn't the reason. I was thinking it
would be nice if one could use True and False
instead of 1 and 0 in the knowledge that it
wasn't costing a couple of dictionary lookups.

However, if a more general way is found of
optimising global lookups, this may become a
non-issue.

Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-12 Thread Greg Ewing
Steven Elliott wrote:

 a pyc file referencing a global in a module may
 have been compiled with a different version of that module (that is
 some_module.some_global can't compiled to single fixed index since
 stuff may shift around in some_module).

Not sure I quite follow that. Since the code in the .pyc
file is what sets the module up in the first place, it can
set up the array to suit its own use of global symbols.
The only way this could go wrong was if you extracted
the code out of a function compiled for one module and
surgically implanted it in another, but if you're hacking
at that level you deserve what you get.

I think it's actually possible to combine all the ideas
suggested so far. Builtins are assigned predefined indexes
for a particular bytecode version, and each module assigns
indexes its own way for other globals it uses. Builtins
can have dedicated opcodes which perform a fast check for
shadowing in the module and builtin arrays. Everything
goes as fast as possible while still allowing anything
to be overridden.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-12 Thread Delaney, Timothy (Tim)
Steven Elliott wrote:

 The important difference between builtins and globals is that with
 builtins both the compiler and the runtime can enumerate all
 references 
 to builtins in a single consistent way.  That is True can always be
 builtin #3 and len can always be builtin #17, or whatever.

__slots__ for modules?

Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-11 Thread Steven Elliott
On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote:
 Steven Elliott wrote:
  One way of handling it is to
  alter STORE_ATTR (op code for assigning to mod.str) to always check to
  see if the key being assigned is one of the default builtins.  If it is,
  then the module's indexed array of builtins is assigned to.
 
 As long as you're going to all that trouble, it
 doesn't seem like it would be much harder to treat
 all global names that way, instead of just a predefined
 set. The compiler already knows all of the names that
 are used as globals in the module's code.

The important difference between builtins and globals is that with
builtins both the compiler and the runtime can enumerate all references
to builtins in a single consistent way.  That is True can always be
builtin #3 and len can always be builtin #17, or whatever.  This isn't
true of globals in that a pyc file referencing a global in a module may
have been compiled with a different version of that module (that is
some_module.some_global can't compiled to single fixed index since
stuff may shift around in some_module).  With globals you have the
same kind of problem that you have with operating systems that use
ordinals to refer to symbols in shared libraries.

So in the case of a static reference to a builtin (while True, or
whatever) the compiler would generate something that refers to it with
that builtin's index (such as a new BUILTIN_OP opcode, as Philip
suggested).  Ordinary globals (non-builtins) would continue to be
generated as the same code (the LOAD_GLOBAL opcode (I'll only refer to
the loading opcodes in this email)).

In the case of a dynamic reference to a builtin (eval('True = 7') or
from foo import * or whatever) would generate the opcode that
indicates that the runtime needs to figure out what do to (the same
LOAD_NAME opcode).  The second part of the the LOAD_NAME opcode is
similar to the current LOAD_GLOBAL opcode - it first checks the hash
tables of globals and then checks the hash table of builtins.  However,
the second part of the LOAD_NAME opcode could be implemented such that
it first checks against a list of default builtins (which could be a
hash table that returns the index of that builtin) and then indexes into
the array of builtins if it is found, or retrieves it from the single
hash table of globals otherwise.  So the LOAD_NAME opcode (or similar
attempts to dynamically get a name) would almost be as efficient as it
currently it.

  That's great, but I'm curious if additional gains can be
  made be focusing just on builtins.
 
 As long as builtins can be shadowed, I can't see how
 to make any extra use of the fact that it's a builtin.
 A semantic change would be needed, such as forbidding
 shadowing of builtins, or at least forbidding this
 from outside the module.

One way of looking at is rather than having a clear distinction between
builtins and globals (as there currently is) there would be a single
global name space that internally in Python is implemented in two data
structures.  An array for frequently used names and a hash table for
infrequently used names.  And the division between the two wouldn't even
have two be between globals and builtins like we've been talking about
so far.

What distinguishes the builtins is you get them for free (initialized on
startup).  So, it would be possible to insert infrequently used builtins
into the hash table of infrequently used names only when the module
refers to it.  Conversely, names that aren't builtins, but that are used
frequently in many different modules, such as sys or os, could have
indexes set aside for for them in the array of frequently used names.
Later, when when it gets a value (because sys is imported, or
whatever) it just gets stuck into the predetermined slot in the array of
frequently used names.

Since builtins can be shadowed, as you point out, there would have to be
one array of frequently used names per module.  But often it would be
the same as other modules.  So internally, as a matter of efficiency,
the interpreter could use a copy on write strategy where a global array
of frequently used names is used by the module until it assigns to
True, or something like that, at which point it gets its own copy.

-- 
---
|  Steven Elliott  |  [EMAIL PROTECTED] |
---


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-10 Thread Greg Ewing
Guido van Rossum wrote:

 I don't think we should make any of these keywords.

Not even True and False? The only good reasons
I can see for anyone wanting to shadow these
are backwards compatibility ones.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-10 Thread Guido van Rossum
On 3/10/06, Greg Ewing [EMAIL PROTECTED] wrote:
 Guido van Rossum wrote:

  I don't think we should make any of these keywords.

 Not even True and False? The only good reasons
 I can see for anyone wanting to shadow these
 are backwards compatibility ones.

Not even True and False.

I don't see why everything that doesn't make sense to be shadowed
ought to become a keyword. That way lies madness.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-10 Thread Neil Schemenauer
Guido van Rossum [EMAIL PROTECTED] wrote:
 Not even True and False.

 I don't see why everything that doesn't make sense to be shadowed
 ought to become a keyword. That way lies madness.

Have you considered whether P3K will disallow names from being
shadowed in such as way as to prevent the compiler from determining
the namespace to look for a given name?  I seen nothing in PEP 3000
related to this issue.

  Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-10 Thread Guido van Rossum
On 3/10/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
 Guido van Rossum [EMAIL PROTECTED] wrote:
  Not even True and False.
 
  I don't see why everything that doesn't make sense to be shadowed
  ought to become a keyword. That way lies madness.

 Have you considered whether P3K will disallow names from being
 shadowed in such as way as to prevent the compiler from determining
 the namespace to look for a given name?  I seen nothing in PEP 3000
 related to this issue.

Yes, that's one of my goals. PEP 280 (clumsily) expresses some of the ideas.

Currently, my main idea is to allow the compiler to assume that if a
known built-in is used in a module, and inspection of the source code
for that module reveals no assignment to a global with the same name,
the built-in can be assumed to have the standard semantics, and may be
hardcoded in any way the compiler seems fit (for example, an opcode
that computes len() would be fair game). Exceptions would be made for
'open' and '__import__' (since these are commonly hacked).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Paul Moore
On 3/9/06, Nick Coghlan [EMAIL PROTECTED] wrote:
 Steven Elliott wrote:
  I'm interested in how builtins could be more efficient.  I've read over
  some of the PEPs having to do with making global variables more
  efficient (search for global):
  http://www.python.org/doc/essays/pepparade.html
  But I think the problem can be simplified by focusing strictly on
  builtins.

 Unfortunately, builtins can currently be shadowed in the module global
 namespace from outside the module (via constructs like import mod; mod.str =
 my_str). Unless/until that becomes illegal, focusing solely on builtins
 doesn't help - the difficulties lie in optimising builtin access while
 preserving the existing name shadowing semantics.

Is there any practical way of detecting and flagging constructs like
the above (remotely shadowing a builtin in another module)? I can't
see a way of doing it (but I know very little about this area...).

If it *is* possible, I'd say it's worth implementing at least a
warning sooner rather than later - the practice seems questionable at
best, and any progress towards outlawing it would help in work on
optimising builtins.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Steven Elliott
On Thu, 2006-03-09 at 12:00 +, Paul Moore wrote:
 On 3/9/06, Nick Coghlan [EMAIL PROTECTED] wrote:
  Steven Elliott wrote:
   I'm interested in how builtins could be more efficient.  I've read over
   some of the PEPs having to do with making global variables more
   efficient (search for global):
   http://www.python.org/doc/essays/pepparade.html
   But I think the problem can be simplified by focusing strictly on
   builtins.
 
  Unfortunately, builtins can currently be shadowed in the module global
  namespace from outside the module (via constructs like import mod; mod.str 
  =
  my_str). Unless/until that becomes illegal, focusing solely on builtins
  doesn't help - the difficulties lie in optimising builtin access while
  preserving the existing name shadowing semantics.
 
 Is there any practical way of detecting and flagging constructs like
 the above (remotely shadowing a builtin in another module)? I can't
 see a way of doing it (but I know very little about this area...).

It may be possible to flag it, or it may be possible it make it work.

In my post I mentioned one special case that needs to be addressed
(assigning to __builtins__).  What Nick mentioned in his post (import
mod; mod.str = my_str) is another special case that needs to be
addressed.  If we can assume that all pyc files are compiled with the
same set of default bulitins (which should be assured by the by the
version in the pyc file) then there are two ways that things like
mod.str = my_str could be handled.

I believe that currently mod.str = my_str alters the module's global
hash table (f-f_globals in the code).  One way of handling it is to
alter STORE_ATTR (op code for assigning to mod.str) to always check to
see if the key being assigned is one of the default builtins.  If it is,
then the module's indexed array of builtins is assigned to.

Alternatively if we also wanted to optimize mod.str = my_str then
there could be a new opcode like STORE_ATTR that would take an index
into the array of builtins instead of an index into the names.

PEP 280, which Nick mentioned, talks about a cells, a hybrid data
structure that can do both hash table lookups and lookups by index
efficiently.  That's great, but I'm curious if additional gains can be
made be focusing just on builtins.

-- 
---
|  Steven Elliott  |  [EMAIL PROTECTED] |
---


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Phillip J. Eby
At 08:50 AM 3/9/2006 -0600, Steven Elliott wrote:
I believe that currently mod.str = my_str alters the module's global
hash table (f-f_globals in the code).  One way of handling it is to
alter STORE_ATTR (op code for assigning to mod.str) to always check to
see if the key being assigned is one of the default builtins.  If it is,
then the module's indexed array of builtins is assigned to.

It's not the opcode that would change, it's the C function referenced by 
the module type's tp_setattro function slot.  This has already been 
attempted before, in order to implement a warning for this behavior.  You 
might want to research that, because the patch ended up being backed out 
for some reason.  I think it ended up being too strict of a check to be 
accepted for Python 2.4.

If some version of it could come back, however, it's possible we could use 
this in Python 2.5 to allow -O compilation to assume that unshadowed 
builtins are static, making it potentially possible to convert some of them 
to specialized bytecodes, or perhaps a BUILTIN_OP n bytecode, where 'n' 
is a combination of an operation selector and the number of arguments to be 
taken from the stack.

The determination of unshadowed would have to be conservative, in that 
'from foo import *' might shadow a builtin, so using 'import *' would 
disable optimization of all builtins.  However, if that were not present, 
and there's no statically detectable assignment shadowing a particular 
builtin, that builtin could be optimized.  (Assuming -O, of course.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Raymond Hettinger
[Steven Elliott]
 As you probably know each access of a builtin requires two hash table
 lookups.  First, the builtin is not found in the list of globals.  It is
 then found in the list of builtins.

If someone really cared about the double lookup, they could flatten a level by 
starting their modules with:

   from __builtin__ import *

However, we don't see people writing this kind of code.  That could mean that 
the double lookup hasn't been a big concern.


 Why not have a means of referencing the default builtins with some sort
 of index the way the LOAD_FAST op code currently works?

FWIW, there was a PEP proposing a roughly similar idea, but the PEP encountered 
a great deal of resistance:

  http://www.python.org/doc/peps/pep-0329/

When it comes time to write your PEP, it would helpful to highlight how it 
differs from PEP 329 (i.e. implemented through the compiler rather than as a 
bytecode hack, etc.).



 Perhaps what I'm suggesting isn't feasible for reasons that have already
 been discussed.  But it seems like it should be possible to make while
 True as efficient as while 1.

That is going to be difficult as long as it is legal to write:

True = 0


Raymond 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Josiah Carlson

Steven Elliott [EMAIL PROTECTED] wrote:
 I'm interested in how builtins could be more efficient.  I've read over
 some of the PEPs having to do with making global variables more
 efficient (search for global):
 http://www.python.org/doc/essays/pepparade.html
 But I think the problem can be simplified by focusing strictly on
 builtins.


Whether or not it is a good idea, it is not wholly uncommon for users to
manipulate variables in the builtin or other module namespaces.  From
very simple import hooks replacing __import__, to module.QUIT = 1, these
things happen, and I don't think that making them produce a warning as
suggested by Paul, necessarily helps us.

I personally think the problem can be simplified (and solved) by
applying decorators where such builtin lookups could/should be more
efficient.  Raymond has a recipe for doing such optimization over at the
Python cookbook:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940

Toss that little guy into a decorators module; if people want their
runtime calling to be fast (at the expense of a slightly increased
compilation time), they'll start using it.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread André Malo
* Paul Moore wrote:

 If it *is* possible, I'd say it's worth implementing at least a
 warning sooner rather than later - the practice seems questionable at
 best, and any progress towards outlawing it would help in work on
 optimising builtins.

/lurk

FWIW, this practice is very handy for unit tests. For example, I often 
shadow builtins like ``file`` for my tests with a mock placed from the 
outside into the global namespace.

nd

lurk
-- 
Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte -- Karl May, Winnetou III

Im Westen was neues: http://pub.perlig.de/books.html#apache2
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Phillip J. Eby
At 08:51 AM 3/9/2006 -0800, Raymond Hettinger wrote:
  Perhaps what I'm suggesting isn't feasible for reasons that have already
  been discussed.  But it seems like it should be possible to make while
  True as efficient as while 1.

That is going to be difficult as long as it is legal to write:

 True = 0

In that case, it can obviously be determined statically that the builtin is 
shadowed, and optimization is impossible for that builtin in that module.

What *is* difficult is the somemodule.True = 0 case, and the 
somemodule.__dict__['True']=0 case, and the globals()['True'] = 0 case, and 
other such fiddling.

Personally, I think that such shenanigans (in the case where somemodule 
doesn't already have a module-level binding for the builtin) should not be 
allowed to change the semantics of optimized functions using that 
dictionary as a global namespace.

I think this is a reasonable implementation limit on dynamicity, although 
to be conservative I think we should only do this with -O in Python 2.5, if 
we do it at all.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Just van Rossum
Phillip J. Eby wrote:

 I think this is a reasonable implementation limit on dynamicity,

Absolutely.

 although to be conservative I think we should only do this with -O in
 Python 2.5, if we do it at all.

Or with a __future__ directive?

Just
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Greg Ewing
Steven Elliott wrote:
 One way of handling it is to
 alter STORE_ATTR (op code for assigning to mod.str) to always check to
 see if the key being assigned is one of the default builtins.  If it is,
 then the module's indexed array of builtins is assigned to.

As long as you're going to all that trouble, it
doesn't seem like it would be much harder to treat
all global names that way, instead of just a predefined
set. The compiler already knows all of the names that
are used as globals in the module's code.

 That's great, but I'm curious if additional gains can be
 made be focusing just on builtins.

As long as builtins can be shadowed, I can't see how
to make any extra use of the fact that it's a builtin.
A semantic change would be needed, such as forbidding
shadowing of builtins, or at least forbidding this
from outside the module.

Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Greg Ewing
Raymond Hettinger wrote:

 That is going to be difficult as long as it is legal to write:
 
 True = 0

BTW, are there any plans to make True and False hard
constants in 3.0 (like None is now)? Maybe also
others like Ellipsis, NotImplemented, etc.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Guido van Rossum
On 3/9/06, Greg Ewing [EMAIL PROTECTED] wrote:
 BTW, are there any plans to make True and False hard
 constants in 3.0 (like None is now)? Maybe also
 others like Ellipsis, NotImplemented, etc.

I don't think we should make any of these keywords.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making builtins more efficient

2006-03-09 Thread Phillip J. Eby
At 12:46 PM 3/10/2006 +1300, Greg Ewing wrote:
Steven Elliott wrote:
  One way of handling it is to
  alter STORE_ATTR (op code for assigning to mod.str) to always check to
  see if the key being assigned is one of the default builtins.  If it is,
  then the module's indexed array of builtins is assigned to.

As long as you're going to all that trouble, it
doesn't seem like it would be much harder to treat
all global names that way, instead of just a predefined
set. The compiler already knows all of the names that
are used as globals in the module's code.

But knowing that an operation is a builtin allows for the possibility of 
invoking the equivalent C operation directly in the interpreter (e.g. via a 
jump table), thus letting us translate something like len(foo) from:

 LOAD_GLOBAL   len
 LOAD_FAST foo
 CALL_FUNCTION 1

into something like:

 LOAD_FAST  foo
 BUILTIN_OP len, 1

And, the BUILTIN_OP could invoke a C function passing in a pointer to the 
arguments on the stack, so as to bypass tuple allocation, and the C 
function would use PySequence_Length() rather than going through the Python 
calling convention to PyObject_Call() the 'len' object.

Now, whether that'll actually speed real programs up much, I don't 
know.  But there are certainly a lot of things being cut out of the process 
using this approach, including an opcode fetch/decode, two dictionary 
lookups (one failing, one successful), and perhaps some tuplefying (only 
for C funcs w argcount1, since those builtins don't need an argtuple IIRC).

And, even if it does speed things up a good deal, there's still a question 
of whether it should be done, when some real systems hack modules' builtins 
for testing.  However, if BUILTIN_OP were to verify at runtime that 
__builtins__ is the interpreter standard builtins (i.e., the 
restricted-mode check), then it could dynamically choose to do the slower 
operation lookup.  That would allow you to hack a module's builtins by 
giving it a fresh __builtins__ dictionary to implement that kind of 
monkeypatching.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com