Unless someone else does the implementation, I'd rather add a utf8-readsig 
encoding that initially only skips a utf8 BOM - notably, you always get the 
same encoding, it just sometimes skips the first three bytes.

I think we can change this later to detect and switch to utf16 without it being 
disastrous, though we've made it this far without it and frankly there are good 
reasons to "encourage" utf8 over utf16.

My big concern is the console... I think that change is inevitably going to 
have to break someone, but I need to map out the possibilities first to figure 
out just how bad it'll be.

Top-posted from my Windows Phone

-----Original Message-----
From: "Random832" <random...@fastmail.com>
Sent: ‎8/‎11/‎2016 7:54
To: "python-ideas@python.org" <python-ideas@python.org>
Subject: Re: [Python-ideas] Fix default encodings on Windows

On Thu, Aug 11, 2016, at 10:25, Steven D'Aprano wrote:
> > Interesting. Are you assuming that a text file cannot be empty?
> 
> Hmmm... not consciously, but I guess I was.
> 
> If the file is empty, how do you know it's text?

Heh. That's the *other* thing that Notepad does wrong in the opinion of
people coming from the Unix world - a Windows text file does not need to
end with a [CR]LF, and normally will not.

> But we're getting off topic here. In context of Steve's suggestion, we 
> should only autodetect UTF-8. In other words, if there's a UTF-8 BOM, 
> skip it, otherwise treat the file as UTF-8.

I think there's still room for UTF-16. It's two of the four encodings
supported by Notepad, after all.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to