On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote:
> 23.01.2019, 16:55, "Edward Welbourne" <edward.welbou...@qt.io>:
> > All of this discussion ignores a major elephant: QString's indexing is
> > by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
> > for a couple of decades now.
> >
> > We *should* have a string type (I don't care what you call it) that acts
> > on strings indexed by Unicode characters, not in terms of a
> > representation. Whether that string type internally uses UTF-16 or
> > UTF-8 should be invisible to its user. Ideally it would be capable of
> > carrying its data internally in either form (so as to avoid needless
> > conversion when both producer and consumer use the same form) and of
> > converting between the two (e.g. so as to append efficiently) as needed.
> 
> I think this is excessive. Most common operations with strings in application
> code are:
>
> * Pass the string around or compare as an opaque token
> * Draw the string on screen e.g. with QPainter (while technically it
>   falls in the previous category, I think it's important enough to
>   deserve separate item)
> * Find substring or pattern (regex) inside the string
> * Split the string by character, pattern, or index boundaries found by means
>   of previous item
> 
> I think the only common cases when dealing with Unicode grapheme clusters
> is required are
>
> * Handling of text cursor movement
> * Implementation of text shaping, i.e. what Harfbuzz is doing
> 
> I think having special iterator would be quite enough for cursor case. Such
> iterator could abstract away underlying encoding, instead of forcing everyone
> to convert to UTF-16 first.

All of that is scarily close to my opinion on the topic.

Andre'
_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Reply via email to