Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

Ron W Tue, 22 Jul 2014 09:48:43 -0700

On Tue, Jul 22, 2014 at 11:48 AM, Stephan Beal <sgb...@googlemail.com>
wrote:

> On Tue, Jul 8, 2014 at 9:37 PM, Stephan Beal <sgb...@googlemail.com>
> wrote:
>
>> No characters between 128 and 255 are valid UTF-8, to avoid confusion
>> with the many encodings which use that range.
>>
>
> For the record, that's apparently wrong. My local man pages (and
> experimentation with the termbox API) say otherwise:
> ....
> So the range is used, but it encodes to two UTF-8 characters.
>

Actually, 1 Unicode character encoded in to 2 UTF-8 bytes.

FWIW, FYI, UTF-8 has an optional Byte Order Mark, 0xEF 0xBB 0xBF,that can
appear at the beginning of a file. This just the UTF-8 encoding of code
point U-00FEFF, which is the actual Unicode Byte Order Mark. For UTF-8,
this mark is really only useful as a suggestion that the following text
might be UFT-8 encoded Unicode. For UFT-16 and UTF-32 encodings, this mark
is used to inform the receiver of the text the order of bytes within the 16
or 32 bit encoding units (presuming that the file is actually UTF-16 or 32
encoded text).

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

Reply via email to