Hi Stephan,

this slipped under my desk..

On Thursday, 2006-11-16 17:39:28 +0100, Stephan Bergmann wrote:

> >This underlines the need of an iterator that takes care of such things.
> >I just wonder how combining characters should be best treated then.
> >I like the idea of an iterator returning normalized 32-bit code points,
> >but still that wouldn't cover all combinations like your key cap
> >example. It seems an iterator should return a string in those cases,
> >which on the other hand would make it slow and cumbersome to use. How
> >could that best be solved?
> 
> What is the use case for such iteration?  Can you give an example?

First note that I'm currently not sure about the idea of letting an
iterator return the normalized form, as even NFC may result in an
expansion, see http://unicode.org/faq/normalization.html#12 and
http://www.unicode.org/reports/tr15/ and this already for a script like
Hebrew http://www.unicode.org/charts/normalization/chart_Hebrew.html
though the "usual" other scripts we support are not affected.
Furthermore there may be pitfalls with string concatenation in NFC (see
Unicode TR15) and we might want to ensure NFC internally, but might lose
some performance then.

Anyway, an example: given an encoded (utf-8/16/32) string one wants to
determine whether the first character of a word is a letter. An iterator
returning either NFC or Unicode code points could be used "on the fly"
for combining sequences if a 32-bit precomposition exists, but for your
example of keycaps I assume there is none. So an 'A'+'keycap' would be
returned as 'A' and misidentified. In these cases the iterator must
signal that the "string position" consists of more than just one 32-bit
value, and it must be possible to obtain that sequence.

Maybe the keycaps example is too arbitrarily constructed, I just think
a design of an iterator should take these things into account. The
Unicode normalization FAQ and TR15 mention enough real world examples,
I guess. Maybe I'm totally wrong with my view on this and I would be
delighted if someone enlightened me :)

  Eike

-- 
 OOo/SO Calc core developer. Number formatter stricken i18n transpositionizer.
 OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
 Please don't send personal mail to the [EMAIL PROTECTED] account, which I use 
for
 mailing lists only and don't read from outside Sun. Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to