Re: [Python-3000] content-based detection

Antoine Pitrou Sun, 10 Sep 2006 12:58:03 -0700

Le dimanche 10 septembre 2006 à 11:30 -0700, Paul Prescod a écrit :
> I don't mind your name of autotextfile but I think that your
> by_content argument defeats the goal of having a very simple API for
> quick and dirty stuff. If content detection is a good idea (usually
> right) then we should do it.


Using system or locale default is trustable and reproduceable.
Content-based detection is wilder, especially if the algorithm isn't
fully refined in the first Py3k releases.

> I can't see an argument for ever turning off the BOM detection. 

Perhaps, but having a subset of it still running behind your back while
you disabled it is misleading.

Also, I think having BOM detection as the only test in content-based
detection would be uninteresting. The common use case for encoding
detection is to guess between one of Unicode variants (mostly UTF-8
*with or without BOM*) and the non-Unicode encoding which is popular for
a given language (e.g. ISO-8859-15).

I doubt many people have to discriminate between UTF-16LE, UCS-4 and
UTF-8. Are there real cases like that for text files?

Regards

Antoine.


_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] content-based detection

Reply via email to