I have several ascii files that contain '\ooo' strings which represent the octal value for a character. I want to convert these files to unicode, and I came up with the following script. But it seems to me that there must be a much simpler way to do it. Could someone more experienced suggest some improvements?
I want to convert a file eg. containing: hello \326du with the unicode file containing: hello Ödu ----------8<--------------------------------------- #!/usr/bin/python import re, string, sys if len(sys.argv) > 1: file = open(sys.argv[1],'r') lines = file.readlines() file.close() else: print "give a filename" sys.exit() def to_unichr(str): oct = string.atoi(str.group(1),8) return unichr(oct) for line in lines: line = string.rstrip(unicode(line,'Latin-1')) if re.compile(r'\\\d\d\d').search(line): line = re.sub(r'\\(\d\d\d)', to_unichr, line) line = line.encode('utf-8') print line ----------8<--------------------------------------- -- http://mail.python.org/mailman/listinfo/python-list