Neil Hodgson wrote:
Reece Dunn:

> The UTF8 <==> UCS conversion utilities in scintilla/src/UniConversion.h
> would be useful to the outside world. For example, in my application, I am > returning the selected text and searching using UCS encoded Windows BSTRs.

   At least one person thinks that UniConversion sucks and mailed me
extensively on the subject. Its normally better to use platform
facilities for this. Scintilla defines enough for just its use so it
doesn't have to unify platform calls. If you want better generic
Unicode features use a project like ICU that is meant for the job.
SinkWorld has better code than Scintilla.

I know that UniConversion is not meant to be a full Unicode conversion library. I'm currently using SetCodePage to get Scintilla to do the correct conversions using the native platform calls.

That said, Scintilla stores the character buffers natively as UTF8. I am successfully using UniConversion to provide Scintilla text to BSTR conversions, without the heavyweight use of ICU as I don't need generic Unicode facilities.

For me, UniConversion works and allows me to keep the code lightweight and fast. I don't need any of the more complex support that ICU or the Mozilla Firefox localisation interfaces provide.

> NOTE: The conversion algorithm doesn't handle the 4th UTF8 byte. I'm
> assuming this is due to lack of support for UTF16 surrogate pairs and
> Unicode planar characters in Windows.

   AFAICT non-BMP use of Windows requires the Chinese GB-18030 add on.
SinkWorld supports non-BMP characters but I won't bother with it yet
for Scintilla.

Do you mean that the 4th byte of a UTF8 string corresponds to the Chinese symbols? If so, aren't these available with the MS Mincho (and I think the MS Gothic) fonts? You need to install the Japanese/Chinese language support to provide the character support. When you have the correct language installed (tested on Windows XP), the characters are available. Thus, you can also view those characters with the regulaar fonts such as Times New Roman.

Provided that you have the correct character, you can use the normal Windows rendering (i.e. ExtTextOutW) to render the Japanese/Chinese text. For example, U+3301 (I think) would be rendered as the "<<" character.

However, there are two planar character sets. IIRC, these are in the range U+1Dxxxx, and are the Fractur mathematical characters and another math-related character set. From what I can recall, Internet Explorer has (had?) problems rendering these characters. I think this also extends to Windows. I thought these would be in the 4th UTF8 byte/

- Reece


_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

Reply via email to