I've been working on epydoc, and the question has come up of how I should treat non-unicode docstrings that contain non-ascii characters. An example of such a file is "python2.4/encodings/string_escape.py", whose module docstring contains an 'o' with an umlaut.
In particular, the question is whether I should assume that the docstring is encoded with the encoding specified by the "-*- coding -*-" directive at the top of the file. The reason why we *wouldn't* use the encoding is that PEP 263 [1], which defines the coding directive, says that it does *not* apply to non-unicode string literals. In particular, PEP 263 says that the entire file should be read & tokenized using the specified coding, but once string objects are created, they should be reencoded back into 8-bit strings using the file encoding. So the "correct" fix is for the author of the module to use unicode literals instead of string literals for docstrings that contain non-ascii characters. This has the advantage that if a user tries to look at the docstring via introspection, it will be correct. On the other hand, epydoc is often used by people other than the author of a module, and requiring them to go through and replace all string literal docstrings with unicode literals seems a bit unreasonable. In a way, this is similar to the mistake I've seen many times of using non-escaped backslashes inside docstrings. e.g.: def wc(filename): """ Count the number of words in the given file. E.g.: >>> wc("c:\test\new.txt") 100 """ Which looks fine in the source file, but looks quite broken if you print its __doc__: >>> print wc.__doc__ Count the number of words in the given file. E.g.: >>> wc("c: est ew.txt") 100 (The right fix in that case is probably to use a raw-string.) So the question is.. Should epydoc (and other tools like it) be compliant with PEP 263 (and consistent with Python); or should they "do what I mean, not what I say" and treat non-ascii docstrings as if they were encoded using the module's encoding? -Edward http://www.python.org/doc/peps/pep-0263/ _______________________________________________ Doc-SIG maillist - Doc-SIG@python.org http://mail.python.org/mailman/listinfo/doc-sig