On Tue, Jul 8, 2014 at 3:38 PM, Andy Bradford <amb-fos...@bradfords.org> wrote:
> That's a good suggestion for fixing the Tcl script, but I'm still not > sure why Fossil thinks that รจ is UTF-8. I thought it was extended ASCII. > > > > I didn't think 0xe8 was UTF-8, but maybe I'm mistaken? > > > > In the fossil UI, all files are displayed assuming the encoding is > > UTF-8. > > That explains the strange character displayed in the browser. If I > switch my browser to ISO-8859-1 it displays fine. > > > More likely is that people are not aware that such characters can > > cause unexpected problems. > > The only thing unexpected has been the warning from Fossil for a file > that previously had no warnings. :-) > > Sounds like my options are either to answer Yes, or update the Tcl file > that I have stored in a Fossil repository to use \xe8. > UTF-8 characters are encoded as a series of strictly formatted bytes, from 1 to 4 bytes in length. The bit patterns of the bytes control whether a stream is considered valid UTF-8 or not. For UTF-8 the 0xE8 byte must be followed by two bytes of the form 0b10xxxxxx. The warning you are seeing is that the stream is invalid UTF-8. 0xE8 byte could be an "extended ASCII" character from one of the ISO-8859-X code pages. Or it could be real binary data that just happens to mostly have ASCII text in it. I think the best idea is to encode these "special" characters as escaped sequences whenever possible. -- Scott Robison
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users