I was thinking of improving std.find. We have this bug report:

http://d.puremagic.com/issues/show_bug.cgi?id=3923

which is pretty vague but does have a point.

For starters, it should be told that std.algorithm.find _does_ a lot, which at least partially justifies its complexity. One thing that I haven't seen other find()s doing is to be able to search in one pass a given range for multiple other ranges, like this:

int[] a = [ 1, 4, 2, 3 ];
assert(find(a, 5, 4, 3) == tuple([ 4, 2, 3 ], 2));

When passed more than two arguments, find returns a tuple continuing the searched ranged positioned on the element found and the 1-based index of the parameter that was found. The trick is that find() makes exactly one pass through the searched range, which is often more efficient than searching the same range for each element in turn. Also the one-pass approach works with input ranges so it doesn't put pressure on the range capabilities.

However the simplified find() looks like, I'd like to keep this feature unless it brings serious aggravation. Right now it's the #1 factor that complicates find's signature and implementation.

Another aspect I'd like to discuss is use of Boyer-Moore and related fast finding techniques. Currently the use of B-M is explicit and a bit awkward. I was thinking instead to automatically detect its appropriatedness and use it transparently. Bearophile posted at a link to such a blend string search routine (http://effbot.org/zone/stringlib.htm) that I think can be generalized quite easily.

Any ideas - please chime in.


Andrei

Reply via email to