Re: Major performance problem with std.array.front()

Vladimir Panteleev Sat, 08 Mar 2014 13:51:25 -0800

On Saturday, 8 March 2014 at 20:52:40 UTC, H. S. Teoh wrote:

Or more to the point, do you know of any experience that youcan shareabout code that attempts to process these sorts of strings on apercharacter basis? My suspicion is that any code that operates onsuch
strings, if they have any claim to correctness at all, must be
substring-based, rather than character-based.

That's pretty much it. Unless you are working in the confines ofcertain languages (alphabets, scripts, etc.), many notions thatare valid for English or European languages lose meaning ingeneral. This includes the notion of "characters" - at fullabstraction, you can only treat a string as a stream of codeunits (or code points, if you wish, but as has been discussed todeath this is rarely useful).

An application which has to handle user text (said text beingpossibly in any language), has to pretty much treat stringvariables as "holy":

- no indexing
- no slicing
- no counting anything
- no toUpper/toLower (std.ascii or std.uni)
etc.

All processing and transformations (line breaking, normalization,etc.) needs to be done using the relevant Unicode algorithms.


I've posted something earlier which I'd like to take back:

[a-z] makes sense in English, and [а-я] makes sense in Russian

[а-я] makes sense for Russian, but it doesn't for Ukrainian, inthe same way how [a-z] is useless for Portuguese. There areprobably only a few such ranges in Unicode which encompassexactly one alphabet, due to how much letters overlap acrossalphabets of similar languages.

Re: Major performance problem with std.array.front()

Reply via email to