[EMAIL PROTECTED] wrote: > Question: what is a good strategy for taking an 8bit > string of unknown encoding and recovering the largest > amount of reasonable information from it (translated to > utf8 if needed)?
Copy the string unmodified to the WWW page and ensure your page doesn't identify the encoding used. That way it becomes the browser's problem, and if the user reading the page can understand the language the string is written in there's a very good chance the browser will display it correctly. Unfortunately, that's how text like this is supposed to be displayed. > The output must be clean utf8 suitable for arbitrary xml parsers. Oh, you're screwed then. Ross Ridge -- http://mail.python.org/mailman/listinfo/python-list