[Python-Dev] Extend reST spec to allow automatic recognition of identifiers in comments?

2008-01-13 Thread Jameson Chema Quinn
This is a VERY VERY rough draft of a PEP. The idea is that there should be
some formal way that reST parsers can differentiate (in docstrings) between
variable/function names and identical English words, within comments.

PEP: XXX
Title: Catching unmarked identifiers in docstrings
Version: 0.0.0.0.1
Last-Modified: 23-Aug-2007
Author: Jameson Quinn firstname dot lastname at gmail
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 23-Aug-2007
Post-History: 30-Aug-2002


Abstract


This PEP makes explicit some additional ways to parse docstrings and
comments
for python identifiers. These are intended to be implementable on their own
or
as extensions to reST, and to make as many existing docstrings
as possible usable by tools that change the visible
representation of identifiers, such as translating (non-english) code
editors
or visual programming environments. Docstrings in widely-used modules are
encouraged to use \`explicit backquotes\` to mark identifiers which are not
caught by these cases.

THIS IS AN EARLY DRAFT OF THIS PEP FOR DISCUSSION PURPOSES ONLY. ALL LOGIC
IS
INTENTIONALLY DEFINED ONLY BY EXAMPLES AND THERE IS NO REFERENCE
IMPLEMENTATION
UNTIL A THERE ARE AT LEAST GLIMMERINGS OF CONSENSUS ON THE RULE SET.


Rationale
=

Python, like most computer languages, is based on English. This can
represent a hurdle to those who do not speak English. Work is underway
on Bityi_, a code viewer/editor which translates code to another language
on load and save. Among the many design issues in Bityi is that of
identifiers in docstrings. A view which translates the identifiers in
code, but leaves the untranslated identifier in the docstrings, makes
the docstrings worse than useless, even if the programmer has a
rudimentary grasp of English. Yet if all identifiers in docstrings are
translated, there is the problem of overtranslation in either direction.
It is necessary to distinguish between the variable named variable,
which should be translated, and the comment that something is highly
variable, which should not.

.. _Bityi: http://wiki.laptop.org/go/Bityi

Note that this is just one use-case; syntax coloring and docstring
hyperlinks are another one. This PEP is not the place for a discussion of
all the pros
and cons of a translating viewer.

PEP 287 standardizes reST as an optional way to markup docstrings.
This includes the possibility of using \`backquotes\` to flag Python
identifiers. However, as this PEP is purely optional, there are many
cases of identifiers in docstrings which are not flagged as such.
Moreover, many of these unflagged cases could be caught programatically.
This would reduce the task of making a module internationally-viewable,
or hyperlinkable, considerably.

This syntax is kept relatively open to allow for reuse with
other programming languages.


Common cases of identifiers in docstrings
=

The most common case is that of lists of argument or
method names. We call these identifier lists::

  def register(func, *targs, **kargs):
  register a function to be executed someday

  func - function to be called
  targs - optional arguments to pass
  kargs - optional keyword arguments to pass
  

  #func, targs, and kargs would be recognized as identifiers in the
above.

  class MyClass(object):
  Just a silly demonstration, with some methods:

  thisword : is a class method and you can call
  it - it may even return a value.

  As with reST, the associated text can have
  several paragraphs.

  BUT - you can't nest this construct, so BUT isn't counted.
  anothermethod: is another method.
  eventhis -- is counted as a method.

  anynumber --- of dashes are allowed in this syntax

  But consider: two words are NOT counted as an identifier.

  things(that,look,like,functions): are functions (see below)

Also, the docstring may have explanatory text, below or by
  itself: so we have to deal with that.
Thus, any paragraph which is NOT preceded by an empty line
  or another identifier list - like itself above - does not count
  as an identifier.
  
  #thisword, anothermethod, eventhis, anynumber, and things would be
  #recognized  as identifiers in the above.

Another case is things which look like functions, lists, indexes, or
dicts::


afunction(is,a,word,with,parentheses)
[a,list,is,a,bunch,of,words,in,brackets]
anindex[is, like, a, cross, between, the, above]
{adict:is,just:words,in:curly, brackets: likethis}

#all of the above would be recogniszed as identifiers.

The syntax of what goes inside these is very loose.
identifier_list ::= [initial_word]opening_symbol content_word
{separator_symbol content_word} closing symbol
, with no whitespace after initial_word, and where separator_symbol is the
set of symbols .,{}[]+-*^%=|/()[]{} MINUS closing_symbol. content_word

Re: [Python-Dev] Extend reST spec to allow automatic recognition of identifiers in comments?

2008-01-06 Thread Jeroen Ruigrok van der Werven
-On [20080105 22:44], Jameson Chema Quinn ([EMAIL PROTECTED]) wrote:
The syntax of what goes inside these is very loose.
identifier_list ::= [initial_word]opening_symbol content_word
{separator_symbol content_word} closing symbol
, with no whitespace after initial_word, and where separator_symbol is the set
of symbols .,{}[]+-*^%=|/()[]{} MINUS closing_symbol. content_word could
maybe be a quoted string, too.
In the function name, no whitespace
is allowed, but the symbols .,*^=- are. Thus::

Given the fact Python 3k will use unicode natively for strings and, IIRC,
UTF-8 as standard encoding for Python files, have you thought about the
half-width and full-width characters like the ones you describe above? 「」
are, for example, very common in Japanese where English would use quotes.

-- 
Jeroen Ruigrok van der Werven asmodai(-at-)in-nomine.org / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
Possession is nine points of the law...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Extend reST spec to allow automatic recognition of identifiers in comments?

2008-01-05 Thread Jameson Chema Quinn
This is a VERY VERY rough draft of a PEP. The idea is that there should be
some formal way that reST parsers can differentiate (in docstrings) between
variable/function names and identical English words, within comments.

PEP: XXX
Title: Catching unmarked identifiers in docstrings
Version: 0.0.0.0.1
Last-Modified: 23-Aug-2007
Author: Jameson Quinn firstname dot lastname at gmail
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 23-Aug-2007
Post-History: 30-Aug-2002


Abstract


This PEP makes explicit some additional ways to parse docstrings and
comments
for python identifiers. These are intended to be implementable on their own
or
as extensions to reST, and to make as many existing docstrings
as possible usable by tools that change the visible
representation of identifiers, such as translating (non-english) code
editors
or visual programming environments. Docstrings in widely-used modules are
encouraged to use \`explicit backquotes\` to mark identifiers which are not
caught by these cases.

THIS IS AN EARLY DRAFT OF THIS PEP FOR DISCUSSION PURPOSES ONLY. ALL LOGIC
IS
INTENTIONALLY DEFINED ONLY BY EXAMPLES AND THERE IS NO REFERENCE
IMPLEMENTATION
UNTIL A THERE ARE AT LEAST GLIMMERINGS OF CONSENSUS ON THE RULE SET.


Rationale
=

Python, like most computer languages, is based on English. This can
represent a hurdle to those who do not speak English. Work is underway
on Bityi_, a code viewer/editor which translates code to another language
on load and save. Among the many design issues in Bityi is that of
identifiers in docstrings. A view which translates the identifiers in
code, but leaves the untranslated identifier in the docstrings, makes
the docstrings worse than useless, even if the programmer has a
rudimentary grasp of English. Yet if all identifiers in docstrings are
translated, there is the problem of overtranslation in either direction.
It is necessary to distinguish between the variable named variable,
which should be translated, and the comment that something is highly
variable, which should not.

.. _Bityi: http://wiki.laptop.org/go/Bityi

Note that this is just one use-case; syntax coloring and docstring
hyperlinks are another one. This PEP is not the place for a discussion of
all the pros
and cons of a translating viewer.

PEP 287 standardizes reST as an optional way to markup docstrings.
This includes the possibility of using \`backquotes\` to flag Python
identifiers. However, as this PEP is purely optional, there are many
cases of identifiers in docstrings which are not flagged as such.
Moreover, many of these unflagged cases could be caught programatically.
This would reduce the task of making a module internationally-viewable,
or hyperlinkable, considerably.

This syntax is kept relatively open to allow for reuse with
other programming languages.


Common cases of identifiers in docstrings
=

The most common case is that of lists of argument or
method names. We call these identifier lists::

  def register(func, *targs, **kargs):
  register a function to be executed someday

  func - function to be called
  targs - optional arguments to pass
  kargs - optional keyword arguments to pass
  

  #func, targs, and kargs would be recognized as identifiers in the
above.

  class MyClass(object):
  Just a silly demonstration, with some methods:

  thisword : is a class method and you can call
  it - it may even return a value.

  As with reST, the associated text can have
  several paragraphs.

  BUT - you can't nest this construct, so BUT isn't counted.
  anothermethod: is another method.
  eventhis -- is counted as a method.

  anynumber --- of dashes are allowed in this syntax

  But consider: two words are NOT counted as an identifier.

  things(that,look,like,functions): are functions (see below)

Also, the docstring may have explanatory text, below or by
  itself: so we have to deal with that.
Thus, any paragraph which is NOT preceded by an empty line
  or another identifier list - like itself above - does not count
  as an identifier.
  
  #thisword, anothermethod, eventhis, anynumber, and things would be
  #recognized  as identifiers in the above.

Another case is things which look like functions, lists, indexes, or
dicts::


afunction(is,a,word,with,parentheses)
[a,list,is,a,bunch,of,words,in,brackets]
anindex[is, like, a, cross, between, the, above]
{adict:is,just:words,in:curly, brackets: likethis}

#all of the above would be recogniszed as identifiers.

The syntax of what goes inside these is very loose.
identifier_list ::= [initial_word]opening_symbol content_word
{separator_symbol content_word} closing symbol
, with no whitespace after initial_word, and where separator_symbol is the
set of symbols .,{}[]+-*^%=|/()[]{} MINUS closing_symbol. content_word