[Python-Dev] Simple syntax proposal: not is
I'm writing a source code editor that translates identifiers and keywords on-screen into a different natural language. This tool will do no transformations except at the reversible word level. There is one simple, avoidable case where this results in nonsense in many languages: is not. I propose allowing not is as an acceptable alternative to is not. Obviously English syntax has a deep influence on python syntax, and I would never propose deeper syntactical changes for natural-language-compatibility. This is a trivial change, one that is still easily parseable by an English-native mind (and IMO actually makes more sense logically, since it does not invite confusion with the nonsensical is (not ...)). The use-cases where you have to grep for is not are few, and the (is not)|(not is) pattern that would replace it is still pretty simple. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Extend reST spec to allow automatic recognition of identifiers in comments?
This is a VERY VERY rough draft of a PEP. The idea is that there should be some formal way that reST parsers can differentiate (in docstrings) between variable/function names and identical English words, within comments. PEP: XXX Title: Catching unmarked identifiers in docstrings Version: 0.0.0.0.1 Last-Modified: 23-Aug-2007 Author: Jameson Quinn firstname dot lastname at gmail Status: Draft Type: Informational Content-Type: text/x-rst Created: 23-Aug-2007 Post-History: 30-Aug-2002 Abstract This PEP makes explicit some additional ways to parse docstrings and comments for python identifiers. These are intended to be implementable on their own or as extensions to reST, and to make as many existing docstrings as possible usable by tools that change the visible representation of identifiers, such as translating (non-english) code editors or visual programming environments. Docstrings in widely-used modules are encouraged to use \`explicit backquotes\` to mark identifiers which are not caught by these cases. THIS IS AN EARLY DRAFT OF THIS PEP FOR DISCUSSION PURPOSES ONLY. ALL LOGIC IS INTENTIONALLY DEFINED ONLY BY EXAMPLES AND THERE IS NO REFERENCE IMPLEMENTATION UNTIL A THERE ARE AT LEAST GLIMMERINGS OF CONSENSUS ON THE RULE SET. Rationale = Python, like most computer languages, is based on English. This can represent a hurdle to those who do not speak English. Work is underway on Bityi_, a code viewer/editor which translates code to another language on load and save. Among the many design issues in Bityi is that of identifiers in docstrings. A view which translates the identifiers in code, but leaves the untranslated identifier in the docstrings, makes the docstrings worse than useless, even if the programmer has a rudimentary grasp of English. Yet if all identifiers in docstrings are translated, there is the problem of overtranslation in either direction. It is necessary to distinguish between the variable named variable, which should be translated, and the comment that something is highly variable, which should not. .. _Bityi: http://wiki.laptop.org/go/Bityi Note that this is just one use-case; syntax coloring and docstring hyperlinks are another one. This PEP is not the place for a discussion of all the pros and cons of a translating viewer. PEP 287 standardizes reST as an optional way to markup docstrings. This includes the possibility of using \`backquotes\` to flag Python identifiers. However, as this PEP is purely optional, there are many cases of identifiers in docstrings which are not flagged as such. Moreover, many of these unflagged cases could be caught programatically. This would reduce the task of making a module internationally-viewable, or hyperlinkable, considerably. This syntax is kept relatively open to allow for reuse with other programming languages. Common cases of identifiers in docstrings = The most common case is that of lists of argument or method names. We call these identifier lists:: def register(func, *targs, **kargs): register a function to be executed someday func - function to be called targs - optional arguments to pass kargs - optional keyword arguments to pass #func, targs, and kargs would be recognized as identifiers in the above. class MyClass(object): Just a silly demonstration, with some methods: thisword : is a class method and you can call it - it may even return a value. As with reST, the associated text can have several paragraphs. BUT - you can't nest this construct, so BUT isn't counted. anothermethod: is another method. eventhis -- is counted as a method. anynumber --- of dashes are allowed in this syntax But consider: two words are NOT counted as an identifier. things(that,look,like,functions): are functions (see below) Also, the docstring may have explanatory text, below or by itself: so we have to deal with that. Thus, any paragraph which is NOT preceded by an empty line or another identifier list - like itself above - does not count as an identifier. #thisword, anothermethod, eventhis, anynumber, and things would be #recognized as identifiers in the above. Another case is things which look like functions, lists, indexes, or dicts:: afunction(is,a,word,with,parentheses) [a,list,is,a,bunch,of,words,in,brackets] anindex[is, like, a, cross, between, the, above] {adict:is,just:words,in:curly, brackets: likethis} #all of the above would be recogniszed as identifiers. The syntax of what goes inside these is very loose. identifier_list ::= [initial_word]opening_symbol content_word {separator_symbol content_word} closing symbol , with no whitespace after initial_word, and where separator_symbol is the set of symbols .,{}[]+-*^%=|/()[]{} MINUS closing_symbol. content_word
[Python-Dev] Extend reST spec to allow automatic recognition of identifiers in comments?
This is a VERY VERY rough draft of a PEP. The idea is that there should be some formal way that reST parsers can differentiate (in docstrings) between variable/function names and identical English words, within comments. PEP: XXX Title: Catching unmarked identifiers in docstrings Version: 0.0.0.0.1 Last-Modified: 23-Aug-2007 Author: Jameson Quinn firstname dot lastname at gmail Status: Draft Type: Informational Content-Type: text/x-rst Created: 23-Aug-2007 Post-History: 30-Aug-2002 Abstract This PEP makes explicit some additional ways to parse docstrings and comments for python identifiers. These are intended to be implementable on their own or as extensions to reST, and to make as many existing docstrings as possible usable by tools that change the visible representation of identifiers, such as translating (non-english) code editors or visual programming environments. Docstrings in widely-used modules are encouraged to use \`explicit backquotes\` to mark identifiers which are not caught by these cases. THIS IS AN EARLY DRAFT OF THIS PEP FOR DISCUSSION PURPOSES ONLY. ALL LOGIC IS INTENTIONALLY DEFINED ONLY BY EXAMPLES AND THERE IS NO REFERENCE IMPLEMENTATION UNTIL A THERE ARE AT LEAST GLIMMERINGS OF CONSENSUS ON THE RULE SET. Rationale = Python, like most computer languages, is based on English. This can represent a hurdle to those who do not speak English. Work is underway on Bityi_, a code viewer/editor which translates code to another language on load and save. Among the many design issues in Bityi is that of identifiers in docstrings. A view which translates the identifiers in code, but leaves the untranslated identifier in the docstrings, makes the docstrings worse than useless, even if the programmer has a rudimentary grasp of English. Yet if all identifiers in docstrings are translated, there is the problem of overtranslation in either direction. It is necessary to distinguish between the variable named variable, which should be translated, and the comment that something is highly variable, which should not. .. _Bityi: http://wiki.laptop.org/go/Bityi Note that this is just one use-case; syntax coloring and docstring hyperlinks are another one. This PEP is not the place for a discussion of all the pros and cons of a translating viewer. PEP 287 standardizes reST as an optional way to markup docstrings. This includes the possibility of using \`backquotes\` to flag Python identifiers. However, as this PEP is purely optional, there are many cases of identifiers in docstrings which are not flagged as such. Moreover, many of these unflagged cases could be caught programatically. This would reduce the task of making a module internationally-viewable, or hyperlinkable, considerably. This syntax is kept relatively open to allow for reuse with other programming languages. Common cases of identifiers in docstrings = The most common case is that of lists of argument or method names. We call these identifier lists:: def register(func, *targs, **kargs): register a function to be executed someday func - function to be called targs - optional arguments to pass kargs - optional keyword arguments to pass #func, targs, and kargs would be recognized as identifiers in the above. class MyClass(object): Just a silly demonstration, with some methods: thisword : is a class method and you can call it - it may even return a value. As with reST, the associated text can have several paragraphs. BUT - you can't nest this construct, so BUT isn't counted. anothermethod: is another method. eventhis -- is counted as a method. anynumber --- of dashes are allowed in this syntax But consider: two words are NOT counted as an identifier. things(that,look,like,functions): are functions (see below) Also, the docstring may have explanatory text, below or by itself: so we have to deal with that. Thus, any paragraph which is NOT preceded by an empty line or another identifier list - like itself above - does not count as an identifier. #thisword, anothermethod, eventhis, anynumber, and things would be #recognized as identifiers in the above. Another case is things which look like functions, lists, indexes, or dicts:: afunction(is,a,word,with,parentheses) [a,list,is,a,bunch,of,words,in,brackets] anindex[is, like, a, cross, between, the, above] {adict:is,just:words,in:curly, brackets: likethis} #all of the above would be recogniszed as identifiers. The syntax of what goes inside these is very loose. identifier_list ::= [initial_word]opening_symbol content_word {separator_symbol content_word} closing symbol , with no whitespace after initial_word, and where separator_symbol is the set of symbols .,{}[]+-*^%=|/()[]{} MINUS closing_symbol. content_word