Neil Hodgson wrote:
Reece Dunn:
> The UTF8 <==> UCS conversion utilities in scintilla/src/UniConversion.h
> would be useful to the outside world. For example, in my application, I
am
> returning the selected text and searching using UCS encoded Windows
BSTRs.
At least one person thinks that UniConversion sucks and mailed me
extensively on the subject. Its normally better to use platform
facilities for this. Scintilla defines enough for just its use so it
doesn't have to unify platform calls. If you want better generic
Unicode features use a project like ICU that is meant for the job.
SinkWorld has better code than Scintilla.
I know that UniConversion is not meant to be a full Unicode conversion
library. I'm currently using SetCodePage to get Scintilla to do the correct
conversions using the native platform calls.
That said, Scintilla stores the character buffers natively as UTF8. I am
successfully using UniConversion to provide Scintilla text to BSTR
conversions, without the heavyweight use of ICU as I don't need generic
Unicode facilities.
For me, UniConversion works and allows me to keep the code lightweight and
fast. I don't need any of the more complex support that ICU or the Mozilla
Firefox localisation interfaces provide.
> NOTE: The conversion algorithm doesn't handle the 4th UTF8 byte. I'm
> assuming this is due to lack of support for UTF16 surrogate pairs and
> Unicode planar characters in Windows.
AFAICT non-BMP use of Windows requires the Chinese GB-18030 add on.
SinkWorld supports non-BMP characters but I won't bother with it yet
for Scintilla.
Do you mean that the 4th byte of a UTF8 string corresponds to the Chinese
symbols? If so, aren't these available with the MS Mincho (and I think the
MS Gothic) fonts? You need to install the Japanese/Chinese language support
to provide the character support. When you have the correct language
installed (tested on Windows XP), the characters are available. Thus, you
can also view those characters with the regulaar fonts such as Times New
Roman.
Provided that you have the correct character, you can use the normal Windows
rendering (i.e. ExtTextOutW) to render the Japanese/Chinese text. For
example, U+3301 (I think) would be rendered as the "<<" character.
However, there are two planar character sets. IIRC, these are in the range
U+1Dxxxx, and are the Fractur mathematical characters and another
math-related character set. From what I can recall, Internet Explorer has
(had?) problems rendering these characters. I think this also extends to
Windows. I thought these would be in the 4th UTF8 byte/
- Reece
_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest