Paul Hardy <unifoun...@gmail.com> writes: > That might not be the only UTF-8 that appears in such files someday > though, so a more general solution would be to start the file with the > UTF-8 signature, aka the Byte Order Mark (BOM). This is the UTF-8 > encoding of U+FEFF, which is 0xEF 0xBB 0xBF or octal \357 \273 \277. > Then a web browser should display UTF-8 characters within the text file > properly.
Hi Paul, I don't believe it's correct to expect UTF-8 files to include this. I've heard of BOM marks used this from the very early days of Unicode, but so far as I understand it, the world has largely given up on this approach and UTF-8 generators do not produce them. Debian is full of UTF-8 files (copyright files, changelog files, etc.), and I don't believe we include those BOM marks anywhere. I don't think it makes sense for Policy to go to special effort to be unique in this regard. You should just assume that all text files in Debian are UTF-8 unless they are declared otherwise and configure browsers and other file readers accordingly. (Also, if you're viewing things in a web browser, just view the HTML files. It will be a much better experience.) -- Russ Allbery (r...@debian.org) <http://www.eyrie.org/~eagle/>