Paul Hardy <unifoun...@gmail.com> writes:

> That might not be the only UTF-8 that appears in such files someday
> though, so a more general solution would be to start the file with the
> UTF-8 signature, aka the Byte Order Mark (BOM).  This is the UTF-8
> encoding of U+FEFF, which is 0xEF 0xBB 0xBF or octal \357 \273 \277.
> Then a web browser should display UTF-8 characters within the text file
> properly.

Hi Paul,

I don't believe it's correct to expect UTF-8 files to include this.  I've
heard of BOM marks used this from the very early days of Unicode, but so
far as I understand it, the world has largely given up on this approach
and UTF-8 generators do not produce them.  Debian is full of UTF-8 files
(copyright files, changelog files, etc.), and I don't believe we include
those BOM marks anywhere.  I don't think it makes sense for Policy to go
to special effort to be unique in this regard.

You should just assume that all text files in Debian are UTF-8 unless they
are declared otherwise and configure browsers and other file readers
accordingly.

(Also, if you're viewing things in a web browser, just view the HTML
files.  It will be a much better experience.)

-- 
Russ Allbery (r...@debian.org)               <http://www.eyrie.org/~eagle/>

Reply via email to