Re: Improving std.algorithm.find

Andrei Alexandrescu Sun, 18 Jul 2010 18:25:16 -0700

On 07/18/2010 04:30 PM, Philippe Sigaud wrote:

On Sun, Jul 18, 2010 at 00:55, Andrei Alexandrescu  wrote:
You could consider than find() is a multiple-returns function: it may
return 0, 1 or many values (the successive finds). Which can, obviously,
be easily represented by a range. I see find() as bit like filter(): its
return type is a range determined by the predicate and whose elements
are the rest of the input range.
so:
find([ 1, 4, 2, 3, 4, 0 ], 4) returns a range whose elements are
[4,2,3,4,0] and [4,0].


If no needle is present in haystack, then the result range is empty.
It's just it's an empty range of ranges, not an empty version of the
original range.

A complement information can be how many steps were taken (that is, the
index of the found elements):  ([4,2,3,4,0], 1), ([4,0], 4). A policy or
another function could take care of that: findIndexed(haystack, needle)
or find!(withIndex)(haystack, needle).

I agree that a findAll function yielding a range is nice, however it'snot the simplest interface. I think there should be a function that justfinds something somewhere.

Now, for many needles... Currently, you pass the predicate "a==b" to
find() and use it on all needles. A possible generalization is to have a
isOneOf() predicate, called with isOneOf(a, b0, b1, ... bn) where a is
the current element being tested and the b's are the needles passed to
find(). This generic calling of pred with (a, needles) could be done for
any predicate, like for example:

find!(isBetween)(haystack, 0, 10)

Aren't such scenarios better served by putting the limits inside thepredicate? Otherwise the list of what can be done could go on forever.

The limit of this is you cannot easily get the index of the found
needle. btw, could you remind us why you used a 1-based index? My
opinion on this is that if you want to find any needle, then you mostly
care for them being present, and much less for wich one is the first.

I used one-based index to make tests easier - as you said, most oftenyou care for the presence so zero/nonzero is easiest to tell. Then Ithought it wouldn't hurt to provide the index since it's not any extra work.

Also, the found element is accessible by taking the front of the retuned
range. Why provide it to the caller?

Because that doesn't generalize well to searching ranges (as oposed toindividual elements).

Another problem I see is to get the nice current behavior of having a
needle being a range like haystack:

find([0,1,2,3,4,3,2,1], [4,3]) => [4,3,2,1]

And that's I particularly like.

Not sure I understand whether that's good or bad, but I also want toprovide the likes of findSkip() (it's been discussed here under adifferent name, findConsume) that positions the searched range after thefound range:


findSkip([0,1,2,3,4,3,2,1], [4,3]) => [2,1]

It's very useful in practice.

*****
     Another aspect I'd like to discuss is use of Boyer-Moore and
related fast finding techniques. Currently the use of B-M is explicit
and a bit awkward. I was thinking instead to automatically detect its
appropriatedness and use it transparently. Bearophile posted at a link
to such a blend string search routine
(http://effbot.org/zone/stringlib.htm) that I think can be generalized
quite easily.
*****

Indeed, reading this, it seems like it could be generalized easily. You
need a random-access range with a defined length. So the usage of such
an algo could indeed be done transparently. Hmm...

The current B-M implementation already works with any random-accessrange with length, but it's not integrated transparently with theregular find. I wonder what the threshold conditions might look like.


Andrei

Re: Improving std.algorithm.find

Reply via email to