Michael Goerz wrote: > Hi, > > I am writing unicode stings into a special text file that requires to > have non-ascii characters as as octal-escaped UTF-8 codes. > > For example, the letter "Í" (latin capital I with acute, code point 205) > would come out as "\303\215". > > I will also have to read back from the file later on and convert the > escaped characters back into a unicode string. > > Does anyone have any suggestions on how to go from "Í" to "\303\215" and > vice versa? > > I know I can get the code point by doing >>>> "Í".decode('utf-8').encode('unicode_escape') > but there doesn't seem to be any similar method for getting the octal > escaped version. > > Thanks, > Michael
I've come up with the following solution. It's not very pretty, but it works (no bugs, I hope). Can anyone think of a better way to do it? Michael _________ import binascii def escape(s): hexstring = binascii.b2a_hex(s) result = "" while len(hexstring) > 0: (hexbyte, hexstring) = (hexstring[:2], hexstring[2:]) octbyte = oct(int(hexbyte, 16)).zfill(3) result += "\\" + octbyte[-3:] return result def unescape(s): result = "" while len(s) > 0: if s[0] == "\\": (octbyte, s) = (s[1:4], s[4:]) try: result += chr(int(octbyte, 8)) except ValueError: result += "\\" s = octbyte + s else: result += s[0] s = s[1:] return result print escape("\303\215") print unescape('adf\\303\\215adf') -- http://mail.python.org/mailman/listinfo/python-list