On Thu, 14 Oct 2010 16:41:13 +1300, Lawrence D'Oliveiro wrote: > In message <mailman.1544.1286800257.29448.python-l...@python.org>, Ethan > Furman wrote: > >> Lawrence D'Oliveiro wrote: >> >>> In message <mailman.1533.1286774527.29448.python-l...@python.org>, >>> Ethan Furman wrote: >>> >>>>Lawrence D'Oliveiro wrote: >>>> >>>>>In message <mailman.1466.1286556950.29448.python-l...@python.org>, >>>>>Ethan Furman wrote: >>>>> >>>>MS treats those first three bytes as a flag -- if they equal the BOM, >>>>MS treats it as UTF-8, if they equal anything else, MS does not treat >>>>it as UTF-8. >>> >>> So what does it treat it as? You previously gave examples of flag >>> values for dBase III. What are the flag values for Windows-1252, >>> versus, say, ISO-8859-15? >> >> I am not aware of any other flag values for text files besides the BOM >> for UTF-8. > > Then how can you say “MS treats those first three bytes as a flag”, > then?
Because Microsoft tools treat those first three bytes as a flag. An *optional* flag, but still a flag. If the first three bytes of a text file equal the UTF-8 BOM, most MS tools treat them as a BOM. If they equal any other value, then they are not treated as a BOM, but merely part of the file's contents. http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx It's not just Notepad either: http://support.microsoft.com/kb/301623 http://msdn.microsoft.com/en-us/library/cc295463.aspx The Python interpreter does the same thing too: http://docs.python.org/reference/lexical_analysis.html#encoding-declarations -- Steven -- http://mail.python.org/mailman/listinfo/python-list