Bertrand Delacretaz wrote:
Le 9 déc. 04, à 09:21, Leszek Gawron a écrit :

...By the way: it is a little bit different on win32. Some tools detect utf encoding by checking for BOM. If there is none - ANSI encoding is assumed...


AFAIU this is ok for 16-bit based encodings, not for UTF-8.

-Bertrand
http://www.xencraft.com/resources/unicodebom.html
<quote>
Even though UTF-8 does not need a BOM to indicate endianness, Microsoft Notepad began prepending a BOM to its UTF-8 text files. Actually, it is a conversion of U+FEFF to an encoding as UTF-8 serialized bytes: EF BB BF (or in 4GL: CHR(15711167)). There is some value in the BOM being used as a file signature, indicating the plain text file is encoded as Unicode UTF-8, as opposed to some other code page. That particular 3-byte sequence is unlikely to represent data in any other code page, given the text is supposed to be human readable in some language. However, there is some small possibility that it represents some string in some code page... Because Microsoft did it, and there is so much Notepad data out there, the UTF-8 BOM became a de facto standard and then a de jure standard. (Although the BOM is optional.)
</quote>


M$ again.

--
Leszek Gawron                                      [EMAIL PROTECTED]
Project Manager                                    MobileBox sp. z o.o.
+48 (61) 855 06 67                              http://www.mobilebox.pl
mobile: +48 (501) 720 812                       fax: +48 (61) 853 29 65

Reply via email to