On Tue, Jul 22, 2014 at 6:47 PM, Ron W <ronw.m...@gmail.com> wrote: > On Tue, Jul 22, 2014 at 11:48 AM, Stephan Beal <sgb...@googlemail.com> > wrote: > >> So the range is used, but it encodes to two UTF-8 characters. >> > > Actually, 1 Unicode character encoded in to 2 UTF-8 bytes. >
One would think i'd be more conscious of how i throw around byte vs character :/. i'm still not clear on the whole char-vs-code point bit, though. > FWIW, FYI, UTF-8 has an optional Byte Order Mark, 0xEF 0xBB 0xBF,that can > appear at the beginning of a file. This just the UTF-8 encoding of code > point U-00FEFF, which is the actual Unicode Byte Order Mark. For UTF-8, > this mark is really only useful as a suggestion that the following text > might be UFT-8 encoded Unicode. For UFT-16 and UTF-32 encodings, this mark > is used to inform the receiver of the text the order of bytes within the 16 > or 32 bit encoding units (presuming that the file is actually UTF-16 or 32 > encoded text). > AFAIK a BOM is not recommended for UTF-8, because it's (except for the use you point out) meaningless and confuses so many tools. That's (partially) what Wikipedia says, anyway (and i didn't write it). -- ----- stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users