On Thu, Jan 09, 2014 at 01:00:59PM +0000, Kristján Valur Jónsson wrote:

> Which reminds me, can Python3 read text files with BOM automatically yet?

I'm not sure what you mean by that. If you mean, can Python3 distinguish 
between UTF-16BE and UTF-16LE on the basis of a BOM, then it's been able 
to do that for a long time:

steve@orac:~$ hexdump sample-utf-16.txt
0000000 feff 0048 0065 006c 006c 006f 0020 0057
0000010 006f 0072 006c 0064 0021 000a 00a2 00a3
0000020 00a7 2022 00b6 00df 03c0 2248 2206 000a
0000030
steve@orac:~$ python3.1 -c "print(open('sample-utf-16.txt', 
encoding='utf-16').read())"
Hello World!
¢£§•¶ßπ≈∆


If you mean, "Will Python assume that the presence of bytes FEFF or FFFE
at the start of a file means that it is encoded in UTF-16?", then as 
far as I know, the answer is "No":

[steve@ando ~]$ python3.3 -c "print(open('sample-utf-16.txt').read())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.3/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: 
invalid start byte


I wouldn't want it to guess the encoding by default. See the Zen about 
ambiguity.


-- 
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to