09-Mar-2014 07:53, Vladimir Panteleev пишет:
On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu wrote:
I don't understand this argument. Iterating by code unit is not
meaningless if you don't want to extract meaning from each unit
iteration. For example, if you're parsing JSON or XML, you only care
about the syntax characters, which are all ASCII. And there is no
confusion of "what exactly are we counting here".

This was debated... people should not be looking at individual code
points, unless they really know what they're doing.

Should they be looking at code units instead?

No. They should only be looking at substrings.

This. Anyhow searching dchar makes sense for _some_ languages, the problem is that it shouldn't decode the whole string but rather encode the needle properly and search that.

Basically the whole thread is about:
how do I work efficiently (no-decoding) with UTF-8/UTF-16 in cases where it obviously can be done?

The current situation is bad in that it undermines writing decode-less generic code. One easily falls into auto-decode trap on first .front, especially when called from some standard algorithm. The algo sees char[]/wchar[] and gets into decode mode via some special case. If it would do that with _all_ char/wchar random access ranges it'd be at least consistent.

That and wrapping your head around 2 sets of constraints. The amount of code around 2 types - wchar[]/char[] is way too much, that much is clear.

--
Dmitry Olshansky

Reply via email to