Re: in not working for arrays is silly, change my view

Steven Schveighoffer via Digitalmars-d-learn Mon, 02 Mar 2020 13:35:55 -0800

On 3/2/20 3:52 PM, aliak wrote:

On Monday, 2 March 2020 at 15:47:26 UTC, Steven Schveighoffer wrote:
On 3/2/20 6:52 AM, Andrea Fontana wrote:
On Saturday, 29 February 2020 at 20:11:24 UTC, Steven Schveighofferwrote:
1. in is supposed to be O(lg(n)) or better. Generic code may dependon this property. Searching an array is O(n).
Probably it should work if we're using a "SortedRange".


int[] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
auto p = assumeSorted(a);

assert(3 in p);
That could work. Currently, you need to use p.contains(3). opIn couldbe added as a shortcut.
It only makes sense if you have it as a literal though, asp.contains(3) isn't that bad to use:
assert(3 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].assumeSorted);
There's no guarantee that checking if a value is in a sorted list is anyfaster than checking if it's in a non sorted list. It's why sort usuallyswitches from a binary-esque algorithm to a linear one at a certainsize.

Well of course! A binary search needs Lg(n) comparisons for pretty muchany value, whereas a linear search is going to end early when it findsit. So there's no guarantee that searching for an element in the list isgoing to be faster one way or the other. But Binary search is going tobe faster overall because the complexity is favorable.

The list could potentially need to be _very_ large for p.containsto make a significant impact over canFind(p) AFAIK.
Here's a small test program, try playing with the numbers and see whathappens:
import std.random;
import std.range;
import std.algorithm;
import std.datetime.stopwatch;
import std.stdio;

void main()
{
     auto count = 1_000;
     auto max = int.max;

     alias randoms = generate!(() => uniform(0, max));

     auto r1 = randoms.take(count).array;
     auto r2 = r1.dup.sort;
     auto elem = r1[uniform(0, count)];


auto elem = r1[$-1]; // try this instead

     benchmark!(
         () => r1.canFind(elem),
         () => r2.contains(elem),
     )(1_000).writeln;
}
Use LDC and -O3 of course. I was hard pressed to get the sorted containsto be any faster than canFind.
This begs the question then: do these requirements on in make any sense?An algorithm can be log n (ala the sorted search) but still be amagnitude slower than a linear search... what has the world come to 🤦‍♂️
PS: Why is it named contains if it's on a SortedRange and canFindotherwise?


A SortedRange uses O(lgn) steps vs. canFind which uses O(n) steps.

If you change your code to testing 1000 random numbers, instead of arandom number guaranteed to be included, then you will see a significantimprovement with the sorted version. I found it to be about 10x faster.(most of the time, none of the other random numbers are included). Evenif you randomly select 1000 numbers from the elements, the binary searchwill be faster. In my tests, it was about 5x faster.

Note that the compiler can do a lot more tricks for linear searches, andCPUs are REALLY good at searching sequential data. But complexity isstill going to win out eventually over heuristics. Phobos needs to be ageneral library, not one that only caters to certain situations.


-Steve

Re: in not working for arrays is silly, change my view

Reply via email to