Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

Stephan Beal Tue, 22 Jul 2014 10:01:48 -0700

On Tue, Jul 22, 2014 at 6:47 PM, Ron W <ronw.m...@gmail.com> wrote:

> On Tue, Jul 22, 2014 at 11:48 AM, Stephan Beal <sgb...@googlemail.com>
> wrote:
>
>> So the range is used, but it encodes to two UTF-8 characters.
>>
>
> Actually, 1 Unicode character encoded in to 2 UTF-8 bytes.
>


One would think i'd be more conscious of how i throw around byte vs
character :/. i'm still not clear on the whole char-vs-code point bit,
though.


> FWIW, FYI, UTF-8 has an optional Byte Order Mark, 0xEF 0xBB 0xBF,that can
> appear at the beginning of a file. This just the UTF-8 encoding of code
> point U-00FEFF, which is the actual Unicode Byte Order Mark. For UTF-8,
> this mark is really only useful as a suggestion that the following text
> might be UFT-8 encoded Unicode. For UFT-16 and UTF-32 encodings, this mark
> is used to inform the receiver of the text the order of bytes within the 16
> or 32 bit encoding units (presuming that the file is actually UTF-16 or 32
> encoded text).
>

AFAIK a BOM is not recommended for UTF-8, because it's (except for the use
you point out) meaningless and confuses so many tools. That's (partially)
what Wikipedia says, anyway (and i didn't write it).

-- 
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

Reply via email to