On Thu, Jan 09, 2014 at 01:00:59PM +0000, Kristján Valur Jónsson wrote: > Which reminds me, can Python3 read text files with BOM automatically yet?
I'm not sure what you mean by that. If you mean, can Python3 distinguish between UTF-16BE and UTF-16LE on the basis of a BOM, then it's been able to do that for a long time: steve@orac:~$ hexdump sample-utf-16.txt 0000000 feff 0048 0065 006c 006c 006f 0020 0057 0000010 006f 0072 006c 0064 0021 000a 00a2 00a3 0000020 00a7 2022 00b6 00df 03c0 2248 2206 000a 0000030 steve@orac:~$ python3.1 -c "print(open('sample-utf-16.txt', encoding='utf-16').read())" Hello World! ¢£§•¶ßπ≈∆ If you mean, "Will Python assume that the presence of bytes FEFF or FFFE at the start of a file means that it is encoded in UTF-16?", then as far as I know, the answer is "No": [steve@ando ~]$ python3.3 -c "print(open('sample-utf-16.txt').read())" Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.3/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte I wouldn't want it to guess the encoding by default. See the Zen about ambiguity. -- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com