Re: [Python-Dev] Moving the developer docs?
Hello All, I am new to this list, but I have been lurking around getting a feel for the environment and processes. I had some discussion yesterday about the developer documentation as well, since it’s what I do professionally. I am a technical writer but also work in the web development arena (using Django). In fact one of my projects now is to develop a comprehensive platform for distributing online help, user documentation, etc. which I am just about to put up on BitBucket (winter ’10). Anyway, that said, with regard to Wikis. I have worked in several organizations where almost all of the development documentation was maintained on a wiki. This can be great for getting up and running with something quickly, but over time it becomes very unmanageable and confusing. What I have done in various organizations has been to create a system where an official repository is kept with all of the *official* documentation and a way for users (developers) to submit their proposals as to what they would like to add and change. These proposals are kept in a tracker where they are read and evaluated. Generally, some discussion ensues and the choices are made as to what stays published or changed. This is what the system I am writing is all about as well. It maintains the documentation, and allows for users to comment on various parts of that documentation and submit requests to change or add. The admins can then change or deny the documentation based on community response. Anyway, I am not pitching my idea or trying to hump my system but I will be releasing it before winter on BitBucket for anyone to try and distribute freely. I do however, discourage the use of wikis at all costs. It has been said that they feel loose and unofficial, and although that my not be the intent, over time this becomes reality. Anyway, thank you for your time. Warmest Regards, Steve On Thu, Sep 23, 2010 at 11:06 AM, Dirkjan Ochtman dirk...@ochtman.nlwrote: On Thu, Sep 23, 2010 at 16:56, Guido van Rossum gu...@python.org wrote: I want to believe your theory (since I also have a feeling that some wiki pages feel less trustworthy than others) but my own use of Wikipedia makes me skeptical that this is all there is -- on many pages on important topics you can clearly tell that a lot of effort went into the article, and then I trust it more. On other places you can tell that almost nobody cared. But I never look at the names of the authors. Right -- I feel like wiki quality varies with the amount of attention spent on maintaining it. Wikis that get a lot of maintenance (or have someone devoted to wiki gardening) will be good (consistent and up to date), while wikis that are only occasionally updated, or updated without much consistency or added to without editing get to feel bad. Seems like a variation of the broken window theory. So what we really need is a way to make editing the developer docs more rewarding (or less hard) for potential authors (i.e. python committers). If putting it in a proper VCS so they can use their editor of choice would help that, that seems like a good solution. Cheers, Dirkjan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/stevenrelliottjr1%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Moving the developer docs?
If we can recruit a bunch of somebodies who *do* care, then the wiki would be much more useful. But I still don't want to edit the dev docs there, if I have a choice :) There's a reason I stopped updating the wiki as soon as I moved to a code repository. I think that there are plenty that do care; I for one would be more than happy to work on whatever documentation needs might arise for this group. I am a bit of a documentation nut, since its what I do, also I come from the Django camp where people are obsessive over documentation. I still think that wikis are not the best solution but if that is something that needs to be tightened up then it would be something that I personally would have no problem working on. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
;`On Thu, 2007-02-22 at 01:26 +0100, Giovanni Bajo wrote: On 20/02/2007 16.07, Steven Elliott wrote: I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. Are you aware of this patch, which is still awaiting review? https://sourceforge.net/tracker/?func=detailatid=305470aid=1616125group_id=5470 I was not aware of your patch. I've since downloaded it, applied it, and played with it a bit. I find the cached module lookups (cached lookups when loading attributes in modules via LOAD_ATTR) to be particularly interesting since it addresses a case where PEP 280 leaves off. Your idea is to have an indexable array of objects that is only used when the hash table has not been changed, which can be determined by the timestamps you added. That may be the best way of handling attributes in modules (LOAD_ATTR). For global variables (LOAD_GLOBAL) I'm curious how it compares to PEP 280 and or Greg Ewing's idea. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On Tue, 2007-02-20 at 07:48 -0800, Guido van Rossum wrote: If this is not a replay of an old message, please move the discussion to python-ideas. It's a modified version of an old idea, so I wasn't sure where to post it since previously it was discussed here. I'll look into python-ideas. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. What I have in mind may be close to what you are suggesting above. My thought now is that builtins are a set of tokens that typically, but don't necessarily, point to the same objects in all modules. Such tokens, which I'll refer to as global tokens, can be roughly broken into two sets: 1) Global tokens that typically point to the same object in all modules. 2) Global tokens that that are likely to point to the different objects (or be undefined) in different modules. Set 1) is pretty much the the builtins. True and len are likely to point to the same objects in all modules, but not necessarily. Set 2) might be things like os and sys which are often defined (imported) in modules, but not necessarily. Access to the globals of a module, including the current module, is done with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For each of these opcodes the following snippet of code from ceval.c (for LOAD_GLOBAL) is relevant to this discussion: /* This is the un-inlined version of the code above */ x = PyDict_GetItem(f-f_globals, w); if (x == NULL) { x = PyDict_GetItem(f-f_builtins, w); if (x == NULL) { load_global_error: format_exc_check_arg( PyExc_NameError, GLOBAL_NAME_ERROR_MSG, w); break; } } So, to avoid the hash table lookups above maybe the global tokens could be assigned an index value that is fixed for any given version of the interpreter and that is the same for all modules (that True is always index 7, len is always index 3, etc.) Once a set of indexes have been determined a new opcode, that I'll call LOAD_GTOKEN, could be created that avoids the hash table lookup by functioning in a way that is similar to LOAD_FAST (pull a local variable value out of an array). For example, static references to True could always be compiled to LOAD_GTOKEN 7 (True) As to set 1) and set 2) that I mentioned above - there is only a need to distinguish between the two sets if a copy-on-write mechanism is used. That way global tokens that are likely to have their value changed (group 2) ) can all be together in one group so that only that group needs to be copied when one of the global tokens is written to. For example code such as: True = 1 print True would be compiled into something like: 1 LOAD_CONST 1 (1) STORE_GTOKEN1 7 (True) 2 LOAD_GTOKEN17 (True) PRINT_ITEM PRINT_NEWLINE Note that 1 has been appended to STORE_GTOKEN to indicate that group 1) is being worked with. The store command will copy the array of pointers once, the first time it is called. Just as a new opcode is needed for LOAD_GLOBAL one would be needed for LOAD_ATTR. Perhaps LOAD_ATOKEN would work. For example: amodule.len = my_len print amodule.len would be compiled into something like: 1 LOAD_GLOBAL 0 (my_len) LOAD_GLOBAL 1 (amodule) STORE_ATOKEN1 3 (len) 2 LOAD_GLOBAL 1 (amodule) LOAD_ATOKEN13 (len) PRINT_ITEM PRINT_NEWLINE LOAD_CONST 0 (None) RETURN_VALUE Note that it looks almost identical to the code that is currently generated, but the oparg 3 shown for the LOAD_ATOKEN1 above indexes into an array (like LOAD_FAST) to get at the attribute directly whereas the oparg that would be shown for LOAD_ATTR is an index into an array of constants/strings which is then used to retrieve the attribute from the module's global hash table. That's great, but I'm curious if additional gains can be made be focusing just on builtins. As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module. I now think that it best not to think of builtins as being a special case. What really matters
Re: [Python-Dev] Making builtins more efficient
On Thu, 2006-03-09 at 08:51 -0800, Raymond Hettinger wrote: [Steven Elliott] As you probably know each access of a builtin requires two hash table lookups. First, the builtin is not found in the list of globals. It is then found in the list of builtins. If someone really cared about the double lookup, they could flatten a level by starting their modules with: from __builtin__ import * However, we don't see people writing this kind of code. That could mean that the double lookup hasn't been a big concern. It could mean that. I think what you are suggesting is sufficiently cleaver that the average Python coder may not have thought of it. In any case, many people are willing to do while 1 instead of while True to avoid the double lookup. And the from __builtin__ import * additionally imposes a startup cost and memory cost (at least a word per builtin, I would guess). Why not have a means of referencing the default builtins with some sort of index the way the LOAD_FAST op code currently works? FWIW, there was a PEP proposing a roughly similar idea, but the PEP encountered a great deal of resistance: http://www.python.org/doc/peps/pep-0329/ When it comes time to write your PEP, it would helpful to highlight how it differs from PEP 329 (i.e. implemented through the compiler rather than as a bytecode hack, etc.). I'm flattered that you think it might be worthy of a PEP. I'll look into doing that. Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make while True as efficient as while 1. That is going to be difficult as long as it is legal to write: True = 0 LOAD_BUILTIN (or whatever we want to call it) should be as fast as LOAD_FAST (locals) or LOAD_CONST in that they each index into an array where the index is the argument to the opcode. I'll look into writing a PEP. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: Steven Elliott wrote: One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. As long as you're going to all that trouble, it doesn't seem like it would be much harder to treat all global names that way, instead of just a predefined set. The compiler already knows all of the names that are used as globals in the module's code. The important difference between builtins and globals is that with builtins both the compiler and the runtime can enumerate all references to builtins in a single consistent way. That is True can always be builtin #3 and len can always be builtin #17, or whatever. This isn't true of globals in that a pyc file referencing a global in a module may have been compiled with a different version of that module (that is some_module.some_global can't compiled to single fixed index since stuff may shift around in some_module). With globals you have the same kind of problem that you have with operating systems that use ordinals to refer to symbols in shared libraries. So in the case of a static reference to a builtin (while True, or whatever) the compiler would generate something that refers to it with that builtin's index (such as a new BUILTIN_OP opcode, as Philip suggested). Ordinary globals (non-builtins) would continue to be generated as the same code (the LOAD_GLOBAL opcode (I'll only refer to the loading opcodes in this email)). In the case of a dynamic reference to a builtin (eval('True = 7') or from foo import * or whatever) would generate the opcode that indicates that the runtime needs to figure out what do to (the same LOAD_NAME opcode). The second part of the the LOAD_NAME opcode is similar to the current LOAD_GLOBAL opcode - it first checks the hash tables of globals and then checks the hash table of builtins. However, the second part of the LOAD_NAME opcode could be implemented such that it first checks against a list of default builtins (which could be a hash table that returns the index of that builtin) and then indexes into the array of builtins if it is found, or retrieves it from the single hash table of globals otherwise. So the LOAD_NAME opcode (or similar attempts to dynamically get a name) would almost be as efficient as it currently it. That's great, but I'm curious if additional gains can be made be focusing just on builtins. As long as builtins can be shadowed, I can't see how to make any extra use of the fact that it's a builtin. A semantic change would be needed, such as forbidding shadowing of builtins, or at least forbidding this from outside the module. One way of looking at is rather than having a clear distinction between builtins and globals (as there currently is) there would be a single global name space that internally in Python is implemented in two data structures. An array for frequently used names and a hash table for infrequently used names. And the division between the two wouldn't even have two be between globals and builtins like we've been talking about so far. What distinguishes the builtins is you get them for free (initialized on startup). So, it would be possible to insert infrequently used builtins into the hash table of infrequently used names only when the module refers to it. Conversely, names that aren't builtins, but that are used frequently in many different modules, such as sys or os, could have indexes set aside for for them in the array of frequently used names. Later, when when it gets a value (because sys is imported, or whatever) it just gets stuck into the predetermined slot in the array of frequently used names. Since builtins can be shadowed, as you point out, there would have to be one array of frequently used names per module. But often it would be the same as other modules. So internally, as a matter of efficiency, the interpreter could use a copy on write strategy where a global array of frequently used names is used by the module until it assigns to True, or something like that, at which point it gets its own copy. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
On Thu, 2006-03-09 at 12:00 +, Paul Moore wrote: On 3/9/06, Nick Coghlan [EMAIL PROTECTED] wrote: Steven Elliott wrote: I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for global): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. Unfortunately, builtins can currently be shadowed in the module global namespace from outside the module (via constructs like import mod; mod.str = my_str). Unless/until that becomes illegal, focusing solely on builtins doesn't help - the difficulties lie in optimising builtin access while preserving the existing name shadowing semantics. Is there any practical way of detecting and flagging constructs like the above (remotely shadowing a builtin in another module)? I can't see a way of doing it (but I know very little about this area...). It may be possible to flag it, or it may be possible it make it work. In my post I mentioned one special case that needs to be addressed (assigning to __builtins__). What Nick mentioned in his post (import mod; mod.str = my_str) is another special case that needs to be addressed. If we can assume that all pyc files are compiled with the same set of default bulitins (which should be assured by the by the version in the pyc file) then there are two ways that things like mod.str = my_str could be handled. I believe that currently mod.str = my_str alters the module's global hash table (f-f_globals in the code). One way of handling it is to alter STORE_ATTR (op code for assigning to mod.str) to always check to see if the key being assigned is one of the default builtins. If it is, then the module's indexed array of builtins is assigned to. Alternatively if we also wanted to optimize mod.str = my_str then there could be a new opcode like STORE_ATTR that would take an index into the array of builtins instead of an index into the names. PEP 280, which Nick mentioned, talks about a cells, a hybrid data structure that can do both hash table lookups and lookups by index efficiently. That's great, but I'm curious if additional gains can be made be focusing just on builtins. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Making builtins more efficient
I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for global): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. One of my assumptions is that only a small fractions of modules override the default builtins with something like: import mybuiltins __builtins__ = mybuiltins As you probably know each access of a builtin requires two hash table lookups. First, the builtin is not found in the list of globals. It is then found in the list of builtins. Why not have a means of referencing the default builtins with some sort of index the way the LOAD_FAST op code currently works? In other words, by default each module gets the default set of builtins indexed (where the index indexes into an array) in a certain order. The version stored in the pyc file would be bumped each time the set of default builtins is changed. I don't have very strong feelings whether things like True = (1 == 1) would be a syntax error, but assigning to a builtin could just do the equivalent of STORE_FAST. I also don't have very strong feelings about whether the array of default builtins would be shared between modules. To simulate the current behavior where attempting to assign to builtin actually alters that module's global hashtable a separate array of builtins could be used for each module. As to assigning to __builtins__ (like I mentioned at the beginning of this post) perhaps it could assign to the builtin array for those items that have a name that matches a default builtin (such as True or len). Those items that don't match a default builtin would just create global variables. Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make while True as efficient as while 1. -- --- | Steven Elliott | [EMAIL PROTECTED] | --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com