On Tue, Jul 22, 2014 at 11:48 AM, Stephan Beal <sgb...@googlemail.com> wrote:
> On Tue, Jul 8, 2014 at 9:37 PM, Stephan Beal <sgb...@googlemail.com> > wrote: > >> No characters between 128 and 255 are valid UTF-8, to avoid confusion >> with the many encodings which use that range. >> > > For the record, that's apparently wrong. My local man pages (and > experimentation with the termbox API) say otherwise: > .... > So the range is used, but it encodes to two UTF-8 characters. > Actually, 1 Unicode character encoded in to 2 UTF-8 bytes. FWIW, FYI, UTF-8 has an optional Byte Order Mark, 0xEF 0xBB 0xBF,that can appear at the beginning of a file. This just the UTF-8 encoding of code point U-00FEFF, which is the actual Unicode Byte Order Mark. For UTF-8, this mark is really only useful as a suggestion that the following text might be UFT-8 encoded Unicode. For UFT-16 and UTF-32 encodings, this mark is used to inform the receiver of the text the order of bytes within the 16 or 32 bit encoding units (presuming that the file is actually UTF-16 or 32 encoded text).
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users