Re: Major performance problem with std.array.front()

Andrei Alexandrescu Sun, 09 Mar 2014 10:21:06 -0700

On 3/9/14, 4:34 AM, Peter Alexander wrote:

I think this is the main confusion: the belief that iterating by code
point has utility.


If you care about normalization then neither by code unit, by code
point, nor by grapheme are correct (except in certain language subsets).

I suspect that code point iteration is the worst as it works only withASCII and perchance with ASCII single-byte extensions. Then we have codeunit iteration that works with a larger spectrum of languages. Onequestion would be how large that spectrum it is. If it's larger thanEnglish, then that would be nice because we would've made progress.

I don't know about normalization beyond discussions in this group, butas far as I understand fromhttp://www.unicode.org/faq/normalization.html, normalization would be aone-step process, after which code point iteration would cover stillmore human languages. No? I'm pretty sure it's more complicated thanthat, so please illuminate me :o).

If you don't care about normalization then by code unit is just as good
as by code point, but you don't need to specialise everywhere in Phobos.

AFAIK, there is only one exception, stuff like s.all!(c => c == 'é'),
but as Vladimir correctly points out: (a) by code point, this is still
broken in the face of normalization, and (b) are there any real
applications that search a string for a specific non-ASCII character?


What happened to counting characters and such?

To those that think the status quo is better, can you give an example of
a real-life use case that demonstrates this?


split(ter) comes to mind.

I do think it's probably too late to change this, but I think there is
value in at least getting everyone on the same page.


Awesome.


Andrei

Re: Major performance problem with std.array.front()

Reply via email to