On Mon, 29 Jan 2007 18:01:56 -0800, Jim wrote: > Hello, > > I'm trying to write exception-handling code that is OK in the > presence > of unicode error messages. I seem to have gotten all mixed up and > I'd > appreciate any un-mixing that anyone can give me.
[snip] >>> class MyException(Exception): pass ... >>> fn = u'a\N{LATIN SMALL LETTER O WITH DIAERESIS}k' >>> raise MyException("hello") Traceback (most recent call last): File "<stdin>", line 1, in ? __main__.MyException: hello >>> Works fine with an ASCII argument, but not with Unicode: >>> raise MyException(fn) Traceback (most recent call last): File "<stdin>", line 1, in ? __main__.MyException>>> Notice the terminal problem? (The error message doesn't print, and the prompt ends up stuck after the exception.) Let's capture the exception and dissect it: >>> try: raise MyException(fn) ... except Exception, err: ... print type(err) ... print err ... <type 'instance'> Traceback (most recent call last): File "<stdin>", line 4, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128) Now we have the answer: your exception (which just sub-classes Exception) does the simplest conversion of Unicode to ASCII possible, and when it hits a character it can't deal with, it barfs. That doesn't happen until you try to print the exception, not when you create it. The easiest ways to fix that are: (1) subclass an exception that already knows about Unicode; (2) convert the file name to ASCII before you store it; or (3) add a __str__ method to your exception that is Unicode aware. I'm going to be lazy and do a real simple-minded version of (2): >>> class MyBetterException(Exception): ... def __init__(self, arg): ... self.args = arg.encode('ascii', 'replace') ... self.unicode_arg = arg # save the original in case >>> raise MyBetterException(fn) Traceback (most recent call last): File "<stdin>", line 1, in ? __main__.MyBetterException: a?k And now it works. -- Steven D'Aprano -- http://mail.python.org/mailman/listinfo/python-list