On Thu, 08 Aug 2013 17:24:49 +0200, Kurt Mueller wrote: > What do I do, when input_strings/output_list has other codings like > iso-8859-1?
When reading from a text file, honour some sort of encoding cookie at the top (or bottom) of the file, like Emacs and Vim use, or a BOM. If there is no encoding cookie, assume UTF-8. When reading from stdin, assume UTF-8. Otherwise, make it the caller's responsibility to specify the encoding if they wish to use something else. Pseudo-code: encoding = None if command line arguments include '--encoding': encoding = --encoding argument if encoding is None: if input file is stdin: encoding = 'utf-8' else: open file as binary if first 2-4 bytes look like a BOM: encoding = one of UTF-8 or UTF-16 or UTF-32 else: read first two lines if either looks like an encoding cookie: encoding = cookie # optionally check the end of the file as well close file if encoding is None: encoding = 'utf-8' read from file using encoding -- Steven -- http://mail.python.org/mailman/listinfo/python-list