and according to the unicode consortium:
A: Yes, UTF-8 can contain a BOM. However, it makes no difference as
to the endianness of the byte stream. UTF-8 always has the same
byte order. An initial BOM is only used as a signature -- an
indication that an otherwise unmarked text file is in UTF-8. Note
that some recipients of UTF-8 encoded data do not expect a BOM.
Where UTF-8 is used transparently in 8-bit environments, the use of
a BOM will interfere with any protocol or file format that expects
specific ASCII characters at the beginning, such as the use of "#!"
of at the beginning of Unix shell scripts. [AF] & [MD]
and
In the absence of a protocol supporting its use as a BOM and
when not at the beginning of a text stream, U+FEFF should normally
not occur.
and
3. Some byte oriented protocols expect ASCII characters at the
beginning of a file. If UTF-8 is used with these protocols, use of
the BOM as encoding form signature should be avoided.
4. Where the precise type of the data stream is known (e.g. Unicode
big-endian or Unicode little-endian), the BOM should not be used.
In particular, whenever a data stream is declared to be UTF-16BE,
UTF-16LE, UTF-32BE or UTF-32LE a BOM must not be used. See also [
why not fix scite to not put in chars it shouldnt?
-elf
On Sun, 9 Sep 2007, Pierpaolo Bernardi wrote:
On 9/9/07, Graham Fawcett <[EMAIL PROTECTED]> wrote:
On 9/8/07, Pierpaolo Bernardi <[EMAIL PROTECTED]> wrote:
UTF8 has no BOM. A BOM in a utf8 file should be there only if you
put it there.
Not true.
http://en.wikipedia.org/wiki/Byte_Order_Mark
UTF8 is defined by the Unicode consortium, not by wikipedia.
See here for example: http://unicode.org/faq/utf_bom.html#29
which says that you can put a bom in a utf8 file (of course, you can
put whatever character you want in a file), but it is a character
like every other character, it has no particular meaning wrt the encoding.
Then, maybe chicken could consider U+FFFE as whitespace, to work
around this bug in scite, and maybe other broken tools.
P.
On 9/9/07, Graham Fawcett <[EMAIL PROTECTED]> wrote:
On 9/8/07, Pierpaolo Bernardi <[EMAIL PROTECTED]> wrote:
UTF8 has no BOM. A BOM in a utf8 file should be there only if you
put it there.
Not true.
http://en.wikipedia.org/wiki/Byte_Order_Mark
G
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users