On Tue, Jul 8, 2014 at 3:38 PM, Andy Bradford <amb-fos...@bradfords.org>
wrote:

> That's a  good suggestion for fixing  the Tcl script, but  I'm still not
> sure why Fossil thinks that รจ is UTF-8. I thought it was extended ASCII.
>
> > > I didn't think 0xe8 was UTF-8, but maybe I'm mistaken?
> >
> > In the  fossil UI, all  files are  displayed assuming the  encoding is
> > UTF-8.
>
> That  explains the  strange character  displayed  in the  browser. If  I
> switch my browser to ISO-8859-1 it displays fine.
>
> > More likely  is that  people are  not aware  that such  characters can
> > cause unexpected problems.
>
> The only  thing unexpected has been  the warning from Fossil  for a file
> that previously had no warnings. :-)
>
> Sounds like my options are either to  answer Yes, or update the Tcl file
> that I have stored in a Fossil repository to use \xe8.
>

UTF-8 characters are encoded as a series of strictly formatted bytes, from
1 to 4 bytes in length. The bit patterns of the bytes control whether a
stream is considered valid UTF-8 or not. For UTF-8 the 0xE8 byte must be
followed by two bytes of the form 0b10xxxxxx. The warning you are seeing is
that the stream is invalid UTF-8. 0xE8 byte could be an "extended ASCII"
character from one of the ISO-8859-X code pages. Or it could be real binary
data that just happens to mostly have ASCII text in it.

I think the best idea is to encode these "special" characters as escaped
sequences whenever possible.

-- 
Scott Robison
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to