Lars Gullik Bjønnes wrote: > So what I plan to do is to stor ucs-4 in our paragraph vector, when > rendering transforms that in a frontend specific way to something the > frontend can handle. For Qt this is ucs-2 strings, and use that to > render. Chars/glyphs outside the basic plane will then have to be > rendered with a '?'. But for gtk f.ex. that uses pango, we can handle > the full unicode. (Since pango uses a ucs-4 unichar.) >
Why do you want to store text in UTF-32 ? From what I understand from the unicode FAQ, UTF-32 has a large memory cost for little benefit over UTF-16. http://www.unicode.org/unicode/faq/utf_bom.html Pango has been criticized for being a ressource and memory hog. One point made by Lars Knoll in his presentation is that the difficulty when you go down the Unicode lane is not to degradate the performance for 'normal' users too much. With ucs-4, if I'm correct, you multiply by 4 the memory size of a LyX document, with ucs-2, you multiply by 2. Why go above the Basic Multilingual Plane and therefore an ucs-2 encoding ? Basic Multinlingual Plane, is not so basic. It covers all the languages written today on our planet. I know there is a Ugaritic package for LaTeX. But is it really serious to multiply the memory cost of LyX for the two thousand people in the world that can write cuneiforms ? Cheers, Charles -- http://www.kde-france.org