Bruno> Certainly, yes. The one who chooses to convert anything to UTF-16
Bruno> should know about this; and this is why the RFC is explicit about
Bruno> it. The RFC also says that you can avoid the BOM problem by using
Bruno> UTF-16LE and UTF-16BE instead of UTF-16.
In the introduction to Unicode for new students and staff, I give them three
"rules" about the Byte Order Mark:
1. Use BOM's when reading and writing files.
2. Do not use BOM's when working with strings inside applications.
3. There will be special cases which are handled individually.
We don't specifically use UTF-16LE and UTF-16BE. That requires knowing the
endian order a priori or requires some form of markup (out-of-band info).
>From experience we know that staff, students, and customers just don't
remember which endian order we agreed to use, and we deal with at 5-10
different forms of markup every day in very large bodies of text (gigabytes);
choosing one form of markup is not economically practical for us.
The cheapest solution was use BOM's consistently.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab Cinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces, NM 88003 -- Robert Bresson
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/