On 5/27/16 7:19 AM, Chris wrote:
On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescu wrote:
[snip]

I would agree only with the amendment "...if used naively", which is
important. Knowledge of how autodecoding works is a prerequisite for
writing fast string code in D. Also, little code should deal with one
code unit or code point at a time; instead, it should use standard
library algorithms for searching, matching etc. When needed, iterating
every code unit is trivially done through indexing.

I disagree.

Misunderstanding.

"if used naively" shouldn't be the default. A user (naively)
expects string algorithms to work as efficiently as possible without
overheads.

That's what happens with autodecoding.

Also allow me to point that much of the slowdown can be addressed
tactically. The test c < 0x80 is highly predictable (in ASCII-heavy
text) and therefore easily speculated. We can and we should arrange
code to minimize impact.

And what if you deal with non-ASCII heavy text? Does the user have to
guess an micro-optimize for simple use cases?

Misunderstanding.

5. Very few algorithms require decoding.

The key here is leaving it to the standard library to do the right
thing instead of having the user wonder separately for each case.
These uses don't need decoding, and the standard library correctly
doesn't involve it (or if it currently does it has a bug):

s.find("abc")
s.findSplit("abc")
s.findSplit('a')
s.count!(c => "!()-;:,.?".canFind(c)) // punctuation

However the following do require autodecoding:

s.walkLength
s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation
s.count!(c => c >= 32) // non-control characters

Currently the standard library operates at code point level even
though inside it may choose to use code units when admissible. Leaving
such a decision to the library seems like a wise thing to do.

But how is the user supposed to know without being a core contributor to
Phobos?

Misunderstanding. All examples work properly today because of autodecoding. -- Andrei

Reply via email to