On Sun, 28 Jun 2009 21:25:13 +0000, Benjamin Peterson wrote: >> > The email module is, yes, broken. You can recover the bytestrings of >> > command-line arguments and environment variables. >> >> 1. Does Python offer any assistance in doing so, or do you have to >> manually convert the surrogates which are generated for unrecognised bytes? > > fs_encoding = sys.getfilesystemencoding() > bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]
This results in an internal error: > "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape") Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: Objects/bytesobject.c:3182: bad argument to internal function [FWIW, the error corresponds to _PyBytes_Resize, which has a cautionary comment almost as large as the code.] The documentation gives the impression that "surrogateescape" is only meaningful for decoding. >> 2. How do you do this for non-invertible encodings (e.g. ISO-2022)? > > What's a non-invertible encoding? I can't find a reference to the term. One where different inputs can produce the same output. -- http://mail.python.org/mailman/listinfo/python-list