On Sun, 09 Mar 2014 01:30:36 +0100 Giuliano Colla <giuliano.co...@fastwebnet.it> wrote:
>[...] > I was aware of that. My problem is that the char I must add to the Utf8 > string is calculated run time, and is in the range Unicode $A0-$BF. The Unicode ranges are given in "code points". These are abstract values that must be encoded in bytes. The most common encodings are UTF-8 and UTF-16. Code point $A0 has two bytes in UTF-8: $C2$A0. > I had assumed (wrongly) that the compiler was smart enough to convert a > type "char" to UTF8, A char is not a code point. A char is an element of string. Every byte encoding consists of chars and so does UTF-8. > when concatenating it to an UTf8 string. Instead it > turns out that the character is appended as it is, which leads to an > invalid UTF8 character (above 127), which displays as a crossed box. > IMHO that's an FPC bug. It's not a bug. > When I realized that, I then tried to explicitly convert the Unicode > char to UTF8, but again I failed, this time because of the default > behavior which is to map char <-> Unicode only in the range 0-127. That's because UTF-8 maps Unicode 0-127 to one byte with the same value as the code point. Above that it uses a different mapping. > Anything above 127 becomes a question mark. > Therefore my symbol displays as a question mark. > IMHO that's a silly FPC limitation. Maybe you underestimate FPC. FPC supports various source encodings. Lazarus uses by default UTF-8. >[...] There are some useful UTF-8 functions in unit LazUTF8 and LazFileUtils. Mattias -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus