Walter Dörwald wrote: > Edward Loper wrote: > >> [...] >> Surely there's a better way than converting back and forth 3 times? Is >> there a reason that the 'backslashreplace' error mode can't be used >> with codecs.decode? >> >> >>> 'abc \xff\xe8 def'.decode('ascii', 'backslashreplace') >> Traceback (most recent call last): >> File "<stdin>", line 1, in ? >> TypeError: don't know how to handle UnicodeDecodeError in error callback > > The backslashreplace error handler is an *error* *handler*, i.e. it > gives you a replacement text if an input character can't be encoded. But > a backslash character in an 8bit string is no error, so it won't get > replaced on decoding.
I'm not sure I follow exactly -- the input string I gave as an example did not contain any backslash characters. Unless by "backslash character" you mean a character c such that ord(c)>127. I guess it depends on which class of errors you think the error handler should be handling. :) The codec system's pretty complex, so I'm willing to accept on faith that there may be a good reason to have error handlers only make replacements in the encode direction, and not in the decode direction. > What you want is a different codec (try e.g. "string-escape" or > "unicode-escape"). This is very close, but unfortunately won't quite work for my purposes, because it also puts backslashes before "'" and "\\" and maybe a few other characters. :-/ >>> print "test: '\xff'".encode('string-escape').decode('ascii') test: \'\xff\' >>> print do_what_i_want("test:\xff'") test: '\xff' I think I'll just have to stick with rolling my own. -Edward -- http://mail.python.org/mailman/listinfo/python-list