On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote: > 23.01.2019, 16:55, "Edward Welbourne" <edward.welbou...@qt.io>: > > All of this discussion ignores a major elephant: QString's indexing is > > by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode > > for a couple of decades now. > > > > We *should* have a string type (I don't care what you call it) that acts > > on strings indexed by Unicode characters, not in terms of a > > representation. Whether that string type internally uses UTF-16 or > > UTF-8 should be invisible to its user. Ideally it would be capable of > > carrying its data internally in either form (so as to avoid needless > > conversion when both producer and consumer use the same form) and of > > converting between the two (e.g. so as to append efficiently) as needed. > > I think this is excessive. Most common operations with strings in application > code are: > > * Pass the string around or compare as an opaque token > * Draw the string on screen e.g. with QPainter (while technically it > falls in the previous category, I think it's important enough to > deserve separate item) > * Find substring or pattern (regex) inside the string > * Split the string by character, pattern, or index boundaries found by means > of previous item > > I think the only common cases when dealing with Unicode grapheme clusters > is required are > > * Handling of text cursor movement > * Implementation of text shaping, i.e. what Harfbuzz is doing > > I think having special iterator would be quite enough for cursor case. Such > iterator could abstract away underlying encoding, instead of forcing everyone > to convert to UTF-16 first.
All of that is scarily close to my opinion on the topic. Andre' _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development