Hi Stephan, this slipped under my desk..
On Thursday, 2006-11-16 17:39:28 +0100, Stephan Bergmann wrote: > >This underlines the need of an iterator that takes care of such things. > >I just wonder how combining characters should be best treated then. > >I like the idea of an iterator returning normalized 32-bit code points, > >but still that wouldn't cover all combinations like your key cap > >example. It seems an iterator should return a string in those cases, > >which on the other hand would make it slow and cumbersome to use. How > >could that best be solved? > > What is the use case for such iteration? Can you give an example? First note that I'm currently not sure about the idea of letting an iterator return the normalized form, as even NFC may result in an expansion, see http://unicode.org/faq/normalization.html#12 and http://www.unicode.org/reports/tr15/ and this already for a script like Hebrew http://www.unicode.org/charts/normalization/chart_Hebrew.html though the "usual" other scripts we support are not affected. Furthermore there may be pitfalls with string concatenation in NFC (see Unicode TR15) and we might want to ensure NFC internally, but might lose some performance then. Anyway, an example: given an encoded (utf-8/16/32) string one wants to determine whether the first character of a word is a letter. An iterator returning either NFC or Unicode code points could be used "on the fly" for combining sequences if a 32-bit precomposition exists, but for your example of keycaps I assume there is none. So an 'A'+'keycap' would be returned as 'A' and misidentified. In these cases the iterator must signal that the "string position" consists of more than just one 32-bit value, and it must be possible to obtain that sequence. Maybe the keycaps example is too arbitrarily constructed, I just think a design of an iterator should take these things into account. The Unicode normalization FAQ and TR15 mention enough real world examples, I guess. Maybe I'm totally wrong with my view on this and I would be delighted if someone enlightened me :) Eike -- OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer. OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS Please don't send personal mail to the [EMAIL PROTECTED] account, which I use for mailing lists only and don't read from outside Sun. Thanks. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]