Re: The Case Against Autodecode

Chris via Digitalmars-d Fri, 27 May 2016 04:21:56 -0700

On Thursday, 26 May 2016 at 16:00:54 UTC, Andrei Alexandrescuwrote:

[snip]

I would agree only with the amendment "...if used naively",which is important. Knowledge of how autodecoding works is aprerequisite for writing fast string code in D. Also, littlecode should deal with one code unit or code point at a time;instead, it should use standard library algorithms forsearching, matching etc. When needed, iterating every code unitis trivially done through indexing.

I disagree. "if used naively" shouldn't be the default. A user(naively) expects string algorithms to work as efficiently aspossible without overheads. To tell the user later that s/heshouldn't _naively_ have used a certain algorithm provided by thelibrary is a bit cynical. Having to redesign a code base becauseof hidden behavior is a big turn off, having to go through Phobosto determine where the hidden pitfalls are is not the user's job.

Also allow me to point that much of the slowdown can beaddressed tactically. The test c < 0x80 is highly predictable(in ASCII-heavy text) and therefore easily speculated. We canand we should arrange code to minimize impact.

And what if you deal with non-ASCII heavy text? Does the userhave to guess an micro-optimize for simple use cases?

5. Very few algorithms require decoding.
The key here is leaving it to the standard library to do theright thing instead of having the user wonder separately foreach case. These uses don't need decoding, and the standardlibrary correctly doesn't involve it (or if it currently doesit has a bug):
s.find("abc")
s.findSplit("abc")
s.findSplit('a')
s.count!(c => "!()-;:,.?".canFind(c)) // punctuation

However the following do require autodecoding:

s.walkLength
s.count!(c => !"!()-;:,.?".canFind(c)) // non-punctuation
s.count!(c => c >= 32) // non-control characters
Currently the standard library operates at code point leveleven though inside it may choose to use code units whenadmissible. Leaving such a decision to the library seems like awise thing to do.

But how is the user supposed to know without being a corecontributor to Phobos? If using a library method that works wellin one case can slow down your code in a slightly different case,something is wrong with the language/library design. For simplecases the burden shouldn't be on the user, or, if it is, s/heshould be informed about it in order to be able to makewell-informed decisions. Personally I wouldn't mind having todecide in each case what I want (provided I have a best practicescheat sheet :)), so I can get the best out of it. But to keepguessing, testing and benchmarking each string handling libraryfunction is not good at all.


[snip]

Re: The Case Against Autodecode

Reply via email to