Re: [Python-Dev] Making builtins more efficient
;`On Thu, 2007-02-22 at 01:26 +0100, Giovanni Bajo wrote: On 20/02/2007 16.07, Steven Elliott wrote: I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. Are you aware of this patch, which is still awaiting review? https://sourceforge.net/tracker/?func=detailatid=305470aid=1616125group_id=5470 I was not aware of your patch. I've since downloaded it, applied it, and played with it a bit. I find the cached module lookups (cached lookups when loading attributes in modules via LOAD_ATTR) to be particularly interesting since it addresses a case where PEP 280 leaves off. Your idea is to have an indexable array of objects that is only used when the hash table has not been changed, which can be determined by the timestamps you added. That may be the best way of handling attributes in modules (LOAD_ATTR). For global variables (LOAD_GLOBAL) I'm curious how it compares to PEP 280 and or Greg Ewing's idea. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On Tue, 2007-02-20 at 07:48 -0800, Guido van Rossum wrote: If this is not a replay of an old message, please move the discussion to python-ideas. It's a modified version of an old idea, so I wasn't sure where to post it since previously it was discussed here. I'll look into python-ideas. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On 20/02/2007 16.07, Steven Elliott wrote: I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. Are you aware of this patch, which is still awaiting review? https://sourceforge.net/tracker/?func=detailatid=305470aid=1616125group_id=5470 -- Giovanni Bajo ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. What I have in mind may be close to what you are suggesting above. My thought now is that builtins are a set of tokens that typically, but don't necessarily, point to the same objects in all modules. Such tokens, which I'll refer to as global tokens, can be roughly broken into two sets: 1) Global tokens that typically point to the same object in all modules. 2) Global tokens that that are likely to point to the different objects (or be undefined) in different modules. Set 1) is pretty much the the builtins. True and len are likely to point to the same objects in all modules, but not necessarily. Set 2) might be things like os and sys which are often defined (imported) in modules, but not necessarily. Access to the globals of a module, including the current module, is done with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For each of these opcodes the following snippet of code from ceval.c (for LOAD_GLOBAL) is relevant to this discussion: /* This is the un-inlined version of the code above */ x = PyDict_GetItem(f-f_globals, w); if (x == NULL) { x = PyDict_GetItem(f-f_builtins, w); if (x == NULL) { load_global_error: format_exc_check_arg( PyExc_NameError, GLOBAL_NAME_ERROR_MSG, w); break; } } So, to avoid the hash table lookups above maybe the global tokens could be assigned an index value that is fixed for any given version of the interpreter and that is the same for all modules (that True is always index 7, len is always index 3, etc.) Once a set of indexes have been determined a new opcode, that I'll call LOAD_GTOKEN, could be created that avoids the hash table lookup by functioning in a way that is similar to LOAD_FAST (pull a local variable value out of an array). For example, static references to True could always be compiled to LOAD_GTOKEN 7 (True) As to set 1) and set 2) that I mentioned above - there is only a need to distinguish between the two sets if a copy-on-write mechanism is used. That way global tokens that are likely to have their value changed (group 2) ) can all be together in one group so that only that group needs to be copied when one of the global tokens is written to. For example code such as: True = 1 print True would be compiled into something like: 1 LOAD_CONST 1 (1) STORE_GTOKEN1 7 (True) 2 LOAD_GTOKEN17 (True) PRINT_ITEM PRINT_NEWLINE Note that 1 has been appended to STORE_GTOKEN to indicate that group 1) is being worked with. The store command will copy the array of pointers once, the first time it is called. Just as a new opcode is needed for LOAD_GLOBAL one would be needed for LOAD_ATTR. Perhaps LOAD_ATOKEN would work. For example: amodule.len = my_len print amodule.len would be compiled into something like: 1 LOAD_GLOBAL 0 (my_len) LOAD_GLOBAL 1 (amodule) STORE_ATOKEN1 3 (len) 2 LOAD_GLOBAL 1 (amodule) LOAD_ATOKEN13 (len) PRINT_ITEM PRINT_NEWLINE LOAD_CONST 0 (None) RETURN_VALUE Note that it looks almost identical to the code that is currently generated, but the oparg 3 shown for the LOAD_ATOKEN1 above indexes into an array (like LOAD_FAST) to get at the attribute directly whereas the oparg that would be shown for LOAD_ATTR is an index into an array of constants/strings which is then used to retrieve the attribute from the module's global hash table. That's great, but I'm curious if additional gains can be made be focusing just on builtins. As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module. I now think that it best not to think of builtins as being a special case. What really matters is that
Re: [Python-Dev] Making builtins more efficient
If this is not a replay of an old message, please move the discussion to python-ideas. On 2/20/07, Steven Elliott [EMAIL PROTECTED] wrote: I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. What I have in mind may be close to what you are suggesting above. My thought now is that builtins are a set of tokens that typically, but don't necessarily, point to the same objects in all modules. Such tokens, which I'll refer to as global tokens, can be roughly broken into two sets: 1) Global tokens that typically point to the same object in all modules. 2) Global tokens that that are likely to point to the different objects (or be undefined) in different modules. Set 1) is pretty much the the builtins. True and len are likely to point to the same objects in all modules, but not necessarily. Set 2) might be things like os and sys which are often defined (imported) in modules, but not necessarily. Access to the globals of a module, including the current module, is done with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For each of these opcodes the following snippet of code from ceval.c (for LOAD_GLOBAL) is relevant to this discussion: /* This is the un-inlined version of the code above */ x = PyDict_GetItem(f-f_globals, w); if (x == NULL) { x = PyDict_GetItem(f-f_builtins, w); if (x == NULL) { load_global_error: format_exc_check_arg( PyExc_NameError, GLOBAL_NAME_ERROR_MSG, w); break; } } So, to avoid the hash table lookups above maybe the global tokens could be assigned an index value that is fixed for any given version of the interpreter and that is the same for all modules (that True is always index 7, len is always index 3, etc.) Once a set of indexes have been determined a new opcode, that I'll call LOAD_GTOKEN, could be created that avoids the hash table lookup by functioning in a way that is similar to LOAD_FAST (pull a local variable value out of an array). For example, static references to True could always be compiled to LOAD_GTOKEN 7 (True) As to set 1) and set 2) that I mentioned above - there is only a need to distinguish between the two sets if a copy-on-write mechanism is used. That way global tokens that are likely to have their value changed (group 2) ) can all be together in one group so that only that group needs to be copied when one of the global tokens is written to. For example code such as: True = 1 print True would be compiled into something like: 1 LOAD_CONST 1 (1) STORE_GTOKEN1 7 (True) 2 LOAD_GTOKEN17 (True) PRINT_ITEM PRINT_NEWLINE Note that 1 has been appended to STORE_GTOKEN to indicate that group 1) is being worked with. The store command will copy the array of pointers once, the first time it is called. Just as a new opcode is needed for LOAD_GLOBAL one would be needed for LOAD_ATTR. Perhaps LOAD_ATOKEN would work. For example: amodule.len = my_len print amodule.len would be compiled into something like: 1 LOAD_GLOBAL 0 (my_len) LOAD_GLOBAL 1 (amodule) STORE_ATOKEN1 3 (len) 2 LOAD_GLOBAL 1 (amodule) LOAD_ATOKEN13 (len) PRINT_ITEM PRINT_NEWLINE LOAD_CONST 0 (None) RETURN_VALUE Note that it looks almost identical to the code that is currently generated, but the oparg 3 shown for the LOAD_ATOKEN1 above indexes into an array (like LOAD_FAST) to get at the attribute directly whereas the oparg that would be shown for LOAD_ATTR is an index into an array of constants/strings which is then used to retrieve the attribute from the module's global hash table. That's great, but I'm curious if additional gains can be made be focusing just on builtins. As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A
Re: [Python-Dev] Making builtins more efficient
On Thu, 2006-03-09 at 08:51 -0800, Raymond Hettinger wrote: [Steven Elliott] As you probably know each access of a builtin requires two hash table lookups. First, the builtin is not found in the list of globals. It is then found in the list of builtins. If someone really cared about the double lookup, they could flatten a level by starting their modules with: from __builtin__ import * However, we don't see people writing this kind of code. That could mean that the double lookup hasn't been a big concern. It could mean that. I think what you are suggesting is sufficiently cleaver that the average Python coder may not have thought of it. In any case, many people are willing to do while 1 instead of while True to avoid the double lookup. And the from __builtin__ import * additionally imposes a startup cost and memory cost (at least a word per builtin, I would guess). Why not have a means of referencing the default builtins with some sort of index the way the LOAD_FAST op code currently works? FWIW, there was a PEP proposing a roughly similar idea, but the PEP encountered a great deal of resistance: http://www.python.org/doc/peps/pep-0329/ When it comes time to write your PEP, it would helpful to highlight how it differs from PEP 329 (i.e. implemented through the compiler rather than as a bytecode hack, etc.). I'm flattered that you think it might be worthy of a PEP. I'll look into doing that. Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make while True as efficient as while 1. That is going to be difficult as long as it is legal to write: True = 0 LOAD_BUILTIN (or whatever we want to call it) should be as fast as LOAD_FAST (locals) or LOAD_CONST in that they each index into an array where the index is the argument to the opcode. I'll look into writing a PEP. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Making builtins more efficient
Paul Moore wrote: Is there any practical way of detecting and flagging constructs like the above (remotely shadowing a builtin in another module)? Phillip J. Eby wrote: the patch ended up being backed out ... too strict of a check to be accepted for Python 2.4. http://svn.python.org/view/python/trunk/Objects/moduleobject.c It was revision 33054, backed out in 33084. The patch warned about any shadowing of builtins, which probably is too strict. An alternative might be to warn only about modifying *another* module's globals. (And perhaps not just when they shadow builtins?) For example, modules could grow a __sealed__ attribute which gets set at the end of the import; instead of using PyObject_GenericSetAttr directly, the tp_setattro slot would check the __sealed__ attribute (and maybe squawk) before deferring. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Phillip J. Eby wrote: At 02:47 PM 3/13/2006 -0500, Jim Jewett wrote: Paul Moore wrote: Is there any practical way of detecting and flagging constructs like the above (remotely shadowing a builtin in another module)? Phillip J. Eby wrote: the patch ended up being backed out ... too strict of a check to be accepted for Python 2.4. http://svn.python.org/view/python/trunk/Objects/moduleobject.c It was revision 33054, backed out in 33084. The patch warned about any shadowing of builtins, which probably is too strict. I'm curious: what actually happened? What things were causing warnings? http://mail.python.org/pipermail/python-dev/2003-July/036879.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pedronis%40strakt.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
At 02:47 PM 3/13/2006 -0500, Jim Jewett wrote: Paul Moore wrote: Is there any practical way of detecting and flagging constructs like the above (remotely shadowing a builtin in another module)? Phillip J. Eby wrote: the patch ended up being backed out ... too strict of a check to be accepted for Python 2.4. http://svn.python.org/view/python/trunk/Objects/moduleobject.c It was revision 33054, backed out in 33084. The patch warned about any shadowing of builtins, which probably is too strict. I'm curious: what actually happened? What things were causing warnings? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Guido van Rossum wrote: I don't see why everything that doesn't make sense to be shadowed ought to become a keyword. That wasn't the reason. I was thinking it would be nice if one could use True and False instead of 1 and 0 in the knowledge that it wasn't costing a couple of dictionary lookups. However, if a more general way is found of optimising global lookups, this may become a non-issue. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Steven Elliott wrote: a pyc file referencing a global in a module may have been compiled with a different version of that module (that is some_module.some_global can't compiled to single fixed index since stuff may shift around in some_module). Not sure I quite follow that. Since the code in the .pyc file is what sets the module up in the first place, it can set up the array to suit its own use of global symbols. The only way this could go wrong was if you extracted the code out of a function compiled for one module and surgically implanted it in another, but if you're hacking at that level you deserve what you get. I think it's actually possible to combine all the ideas suggested so far. Builtins are assigned predefined indexes for a particular bytecode version, and each module assigns indexes its own way for other globals it uses. Builtins can have dedicated opcodes which perform a fast check for shadowing in the module and builtin arrays. Everything goes as fast as possible while still allowing anything to be overridden. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Steven Elliott wrote: The important difference between builtins and globals is that with builtins both the compiler and the runtime can enumerate all references to builtins in a single consistent way. That is True can always be builtin #3 and len can always be builtin #17, or whatever. __slots__ for modules? Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. The important difference between builtins and globals is that with builtins both the compiler and the runtime can enumerate all references to builtins in a single consistent way. That is True can always be builtin #3 and len can always be builtin #17, or whatever. This isn't true of globals in that a pyc file referencing a global in a module may have been compiled with a different version of that module (that is some_module.some_global can't compiled to single fixed index since stuff may shift around in some_module). With globals you have the same kind of problem that you have with operating systems that use ordinals to refer to symbols in shared libraries. So in the case of a static reference to a builtin (while True, or whatever) the compiler would generate something that refers to it with that builtin's index (such as a new BUILTIN_OP opcode, as Philip suggested). Ordinary globals (non-builtins) would continue to be generated as the same code (the LOAD_GLOBAL opcode (I'll only refer to the loading opcodes in this email)). In the case of a dynamic reference to a builtin (eval('True = 7') or from foo import * or whatever) would generate the opcode that indicates that the runtime needs to figure out what do to (the same LOAD_NAME opcode). The second part of the the LOAD_NAME opcode is similar to the current LOAD_GLOBAL opcode - it first checks the hash tables of globals and then checks the hash table of builtins. However, the second part of the LOAD_NAME opcode could be implemented such that it first checks against a list of default builtins (which could be a hash table that returns the index of that builtin) and then indexes into the array of builtins if it is found, or retrieves it from the single hash table of globals otherwise. So the LOAD_NAME opcode (or similar attempts to dynamically get a name) would almost be as efficient as it currently it. That's great, but I'm curious if additional gains can be made be focusing just on builtins. As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module. One way of looking at is rather than having a clear distinction between builtins and globals (as there currently is) there would be a single global name space that internally in Python is implemented in two data structures. An array for frequently used names and a hash table for infrequently used names. And the division between the two wouldn't even have two be between globals and builtins like we've been talking about so far. What distinguishes the builtins is you get them for free (initialized on startup). So, it would be possible to insert infrequently used builtins into the hash table of infrequently used names only when the module refers to it. Conversely, names that aren't builtins, but that are used frequently in many different modules, such as sys or os, could have indexes set aside for for them in the array of frequently used names. Later, when when it gets a value (because sys is imported, or whatever) it just gets stuck into the predetermined slot in the array of frequently used names. Since builtins can be shadowed, as you point out, there would have to be one array of frequently used names per module. But often it would be the same as other modules. So internally, as a matter of efficiency, the interpreter could use a copy on write strategy where a global array of frequently used names is used by the module until it assigns to True, or something like that, at which point it gets its own copy. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Guido van Rossum wrote: I don't think we should make any of these keywords. Not even True and False? The only good reasons I can see for anyone wanting to shadow these are backwards compatibility ones. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On 3/10/06, Greg Ewing [EMAIL PROTECTED] wrote: Guido van Rossum wrote: I don't think we should make any of these keywords. Not even True and False? The only good reasons I can see for anyone wanting to shadow these are backwards compatibility ones. Not even True and False. I don't see why everything that doesn't make sense to be shadowed ought to become a keyword. That way lies madness. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Guido van Rossum [EMAIL PROTECTED] wrote: Not even True and False. I don't see why everything that doesn't make sense to be shadowed ought to become a keyword. That way lies madness. Have you considered whether P3K will disallow names from being shadowed in such as way as to prevent the compiler from determining the namespace to look for a given name? I seen nothing in PEP 3000 related to this issue. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On 3/10/06, Neil Schemenauer [EMAIL PROTECTED] wrote: Guido van Rossum [EMAIL PROTECTED] wrote: Not even True and False. I don't see why everything that doesn't make sense to be shadowed ought to become a keyword. That way lies madness. Have you considered whether P3K will disallow names from being shadowed in such as way as to prevent the compiler from determining the namespace to look for a given name? I seen nothing in PEP 3000 related to this issue. Yes, that's one of my goals. PEP 280 (clumsily) expresses some of the ideas. Currently, my main idea is to allow the compiler to assume that if a known built-in is used in a module, and inspection of the source code for that module reveals no assignment to a global with the same name, the built-in can be assumed to have the standard semantics, and may be hardcoded in any way the compiler seems fit (for example, an opcode that computes len() would be fair game). Exceptions would be made for 'open' and '__import__' (since these are commonly hacked). -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On 3/9/06, Nick Coghlan [EMAIL PROTECTED] wrote: Steven Elliott wrote: I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for global): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. Unfortunately, builtins can currently be shadowed in the module global namespace from outside the module (via constructs like import mod; mod.str = my_str). Unless/until that becomes illegal, focusing solely on builtins doesn't help - the difficulties lie in optimising builtin access while preserving the existing name shadowing semantics. Is there any practical way of detecting and flagging constructs like the above (remotely shadowing a builtin in another module)? I can't see a way of doing it (but I know very little about this area...). If it *is* possible, I'd say it's worth implementing at least a warning sooner rather than later - the practice seems questionable at best, and any progress towards outlawing it would help in work on optimising builtins. Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On Thu, 2006-03-09 at 12:00 +, Paul Moore wrote: On 3/9/06, Nick Coghlan [EMAIL PROTECTED] wrote: Steven Elliott wrote: I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for global): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. Unfortunately, builtins can currently be shadowed in the module global namespace from outside the module (via constructs like import mod; mod.str = my_str). Unless/until that becomes illegal, focusing solely on builtins doesn't help - the difficulties lie in optimising builtin access while preserving the existing name shadowing semantics. Is there any practical way of detecting and flagging constructs like the above (remotely shadowing a builtin in another module)? I can't see a way of doing it (but I know very little about this area...). It may be possible to flag it, or it may be possible it make it work. In my post I mentioned one special case that needs to be addressed (assigning to __builtins__). What Nick mentioned in his post (import mod; mod.str = my_str) is another special case that needs to be addressed. If we can assume that all pyc files are compiled with the same set of default bulitins (which should be assured by the by the version in the pyc file) then there are two ways that things like mod.str = my_str could be handled. I believe that currently mod.str = my_str alters the module's global hash table (f-f_globals in the code). One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. Alternatively if we also wanted to optimize mod.str = my_str then there could be a new opcode like STORE_ATTR that would take an index into the array of builtins instead of an index into the names. PEP 280, which Nick mentioned, talks about a cells, a hybrid data structure that can do both hash table lookups and lookups by index efficiently. That's great, but I'm curious if additional gains can be made be focusing just on builtins. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
At 08:50 AM 3/9/2006 -0600, Steven Elliott wrote: I believe that currently mod.str = my_str alters the module's global hash table (f-f_globals in the code). One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. It's not the opcode that would change, it's the C function referenced by the module type's tp_setattro function slot. This has already been attempted before, in order to implement a warning for this behavior. You might want to research that, because the patch ended up being backed out for some reason. I think it ended up being too strict of a check to be accepted for Python 2.4. If some version of it could come back, however, it's possible we could use this in Python 2.5 to allow -O compilation to assume that unshadowed builtins are static, making it potentially possible to convert some of them to specialized bytecodes, or perhaps a BUILTIN_OP n bytecode, where 'n' is a combination of an operation selector and the number of arguments to be taken from the stack. The determination of unshadowed would have to be conservative, in that 'from foo import *' might shadow a builtin, so using 'import *' would disable optimization of all builtins. However, if that were not present, and there's no statically detectable assignment shadowing a particular builtin, that builtin could be optimized. (Assuming -O, of course.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
[Steven Elliott] As you probably know each access of a builtin requires two hash table lookups. First, the builtin is not found in the list of globals. It is then found in the list of builtins. If someone really cared about the double lookup, they could flatten a level by starting their modules with: from __builtin__ import * However, we don't see people writing this kind of code. That could mean that the double lookup hasn't been a big concern. Why not have a means of referencing the default builtins with some sort of index the way the LOAD_FAST op code currently works? FWIW, there was a PEP proposing a roughly similar idea, but the PEP encountered a great deal of resistance: http://www.python.org/doc/peps/pep-0329/ When it comes time to write your PEP, it would helpful to highlight how it differs from PEP 329 (i.e. implemented through the compiler rather than as a bytecode hack, etc.). Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make while True as efficient as while 1. That is going to be difficult as long as it is legal to write: True = 0 Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Steven Elliott [EMAIL PROTECTED] wrote: I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for global): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. Whether or not it is a good idea, it is not wholly uncommon for users to manipulate variables in the builtin or other module namespaces. From very simple import hooks replacing __import__, to module.QUIT = 1, these things happen, and I don't think that making them produce a warning as suggested by Paul, necessarily helps us. I personally think the problem can be simplified (and solved) by applying decorators where such builtin lookups could/should be more efficient. Raymond has a recipe for doing such optimization over at the Python cookbook: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940 Toss that little guy into a decorators module; if people want their runtime calling to be fast (at the expense of a slightly increased compilation time), they'll start using it. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
* Paul Moore wrote: If it *is* possible, I'd say it's worth implementing at least a warning sooner rather than later - the practice seems questionable at best, and any progress towards outlawing it would help in work on optimising builtins. /lurk FWIW, this practice is very handy for unit tests. For example, I often shadow builtins like ``file`` for my tests with a mock placed from the outside into the global namespace. nd lurk -- Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine beiden Gefährten nicht zu zählen brauchte -- Karl May, Winnetou III Im Westen was neues: http://pub.perlig.de/books.html#apache2 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
At 08:51 AM 3/9/2006 -0800, Raymond Hettinger wrote: Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make while True as efficient as while 1. That is going to be difficult as long as it is legal to write: True = 0 In that case, it can obviously be determined statically that the builtin is shadowed, and optimization is impossible for that builtin in that module. What *is* difficult is the somemodule.True = 0 case, and the somemodule.__dict__['True']=0 case, and the globals()['True'] = 0 case, and other such fiddling. Personally, I think that such shenanigans (in the case where somemodule doesn't already have a module-level binding for the builtin) should not be allowed to change the semantics of optimized functions using that dictionary as a global namespace. I think this is a reasonable implementation limit on dynamicity, although to be conservative I think we should only do this with -O in Python 2.5, if we do it at all. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Phillip J. Eby wrote: I think this is a reasonable implementation limit on dynamicity, Absolutely. although to be conservative I think we should only do this with -O in Python 2.5, if we do it at all. Or with a __future__ directive? Just ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. That's great, but I'm curious if additional gains can be made be focusing just on builtins. As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
Raymond Hettinger wrote: That is going to be difficult as long as it is legal to write: True = 0 BTW, are there any plans to make True and False hard constants in 3.0 (like None is now)? Maybe also others like Ellipsis, NotImplemented, etc. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On 3/9/06, Greg Ewing [EMAIL PROTECTED] wrote: BTW, are there any plans to make True and False hard constants in 3.0 (like None is now)? Maybe also others like Ellipsis, NotImplemented, etc. I don't think we should make any of these keywords. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
At 12:46 PM 3/10/2006 +1300, Greg Ewing wrote: Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. But knowing that an operation is a builtin allows for the possibility of invoking the equivalent C operation directly in the interpreter (e.g. via a jump table), thus letting us translate something like len(foo) from: LOAD_GLOBAL len LOAD_FAST foo CALL_FUNCTION 1 into something like: LOAD_FAST foo BUILTIN_OP len, 1 And, the BUILTIN_OP could invoke a C function passing in a pointer to the arguments on the stack, so as to bypass tuple allocation, and the C function would use PySequence_Length() rather than going through the Python calling convention to PyObject_Call() the 'len' object. Now, whether that'll actually speed real programs up much, I don't know. But there are certainly a lot of things being cut out of the process using this approach, including an opcode fetch/decode, two dictionary lookups (one failing, one successful), and perhaps some tuplefying (only for C funcs w argcount1, since those builtins don't need an argtuple IIRC). And, even if it does speed things up a good deal, there's still a question of whether it should be done, when some real systems hack modules' builtins for testing. However, if BUILTIN_OP were to verify at runtime that __builtins__ is the interpreter standard builtins (i.e., the restricted-mode check), then it could dynamically choose to do the slower operation lookup. That would allow you to hack a module's builtins by giving it a fresh __builtins__ dictionary to implement that kind of monkeypatching. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Making builtins more efficient
I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for global): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. One of my assumptions is that only a small fractions of modules override the default builtins with something like: import mybuiltins __builtins__ = mybuiltins As you probably know each access of a builtin requires two hash table lookups. First, the builtin is not found in the list of globals. It is then found in the list of builtins. Why not have a means of referencing the default builtins with some sort of index the way the LOAD_FAST op code currently works? In other words, by default each module gets the default set of builtins indexed (where the index indexes into an array) in a certain order. The version stored in the pyc file would be bumped each time the set of default builtins is changed. I don't have very strong feelings whether things like True = (1 == 1) would be a syntax error, but assigning to a builtin could just do the equivalent of STORE_FAST. I also don't have very strong feelings about whether the array of default builtins would be shared between modules. To simulate the current behavior where attempting to assign to builtin actually alters that module's global hashtable a separate array of builtins could be used for each module. As to assigning to __builtins__ (like I mentioned at the beginning of this post) perhaps it could assign to the builtin array for those items that have a name that matches a default builtin (such as True or len). Those items that don't match a default builtin would just create global variables. Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make while True as efficient as while 1. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com