Francis Girard wrote:
Le lundi 7 Mars 2005 21:54, "Martin v. LÃwis" a Ãcrit :

Hi,

Thank you for your very informative answer. Some interspersed remarks  follow.


I personally would write my applications so that they put the signature
into files that cannot be concatenated meaningfully (since the
signature simplifies encoding auto-detection) and leave out the
signature from files which can be concatenated (as concatenating the
files will put the signature in the middle of a file).



Well, no text files can't be concatenated ! Sooner or later, someone will use "cat" on the text files your application did generate. That will be a lot of fun for the new unicode aware "super-cat".


It is my understanding that the BOM (U+feff) is actually the Unicode character "Non-breaking zero-width space". I take this to mean that the character can appear invisibly anywhere in text, and its appearance as the first character of a text is pretty harmless. Concateniating files will leave invisible space characters in the middle of the text, but presumably not in the middle of words, so no harm is done there either.


I suspect that the fact that an explicitly invisible character feff has an invalid character code fffe for its byte-reversed counterpart is no accident, and that the charecter was intended from inception to also server as a byte order indication.

Steve
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to