Re: An interesting data structure with search time O(sqrt n)

Navin via Digitalmars-d Tue, 01 Dec 2015 22:01:22 -0800

On Tuesday, 1 December 2015 at 22:48:56 UTC, Andrei Alexandrescuwrote:

On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:
On 11/30/15 9:47 PM, Timon Gehr wrote:
On 12/01/2015 03:33 AM, Timon Gehr wrote:
On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:
So now consider my square heaps. We have O(n) build time(just a bunch
of heapifications) and O(sqrt n) search.
How do you build in O(n)? (The initial array is assumed tobe completely
unordered, afaict.)
(I meant to say: There aren't any assumptions on the initialordering of
the array elements.)
That's quite challenging. (My O(n) estimate was off the cuffandpossibly wrong.) Creating the structure entails simultaneouslysolvingthe selection problem (find the k smallest elements) forseveral values
of k. I'll post here if I come up with something. -- Andrei
OK, I think I have an answer to this in the form of anefficient algorithm.
First off: sizes 1+3+5+7+... seem a great choice, I'll use thatfor the initial implementation (thanks Titus!).
Second: the whole max heap is a red herring - min heap is justas good, and in fact better. When doing the search justovershoot by one then go back one heap to the left and do thefinal linear search in there.
So the structure we're looking at is an array of adjacentmin-heaps of sizes 1, 3, 5, etc. The heaps are ordered (themaximum of heap k is less than or equal to the minimum of heapk+1). Question is how do we build such an array of heaps inplace starting from an unstructured array of size n.
One simple approach is to just sort the array in O(n log n).This satisfies all properties - all adjacent subsequences areobviously ordered, and any subsequence has the min heapproperty. As an engineering approach we may as well stop here -sorting is a widely studied and well implemented algorithm.However, we hope to get away with less work because we don'tquite need full sorting.
Here's the intuition: the collection of heaps can be seen asone large heap that has a DAG structure (as opposed to a tree).In the DAG, the root of heap k+1 is the child of all leaves ofheap k (see http://imgur.com/I366GYS which shows the DAG forthe 1, 3, 7, and 7 heaps).
Clearly getting this structure to respect the heap property isall that's needed for everything to work - so we simply applythe classic heapify algorithm to it. It seems it can be appliedalmost unchanged - starting from the end, sift each elementdown the DAG.
This looks efficient and minimal; I doubt there's any redundantwork. However, getting bounds for complexity of this will betricky. Classic heapify is tricky, too - it seems to havecomplexity O(n log n) but in fact has complexity O(n) - seenice discussion athttp://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity. When applying heapify to the DAG, there's more restrictions and the paths are longer, so a sliver more than O(n) is expected.
Anyway, this looks ready for a preliminary implementation andsome more serious calculations.
One more interesting thing: the heap heads are sorted, so whensearching, the heap corresponding to the searched item can befound using binary search. That makes that part of the searchessentially negligible - the lion's share will be the linearsearch on the last mile. In turn, that suggests that more heapsthat are smaller would be a better choice. (At an extreme, ifwe have an array of heaps each proportional to log(n), then weget search time O(log n) even though the array is not entirelysorted.)
Andrei


Nice to see this interesting post and learn.

 I have a few questions.

1) This is offline datastructure since you don't know how theelements of the future are going to be ie dynamic. ie laterelements from n to 2n can break or change your heaps as such inworst case or is it a dynamic data structure ?

2) Searching in min or max heaps is bad isn't it ? Lets say wechoose max heaps. Now we have the root as 10^9 in the second lastheap ie around n^2 elements. The children of it are 4*10^8 and5*10^8 . If i'm searching for say 4.5 *10^8 my job is easy but ifi'm searching for 1000, i have to search in both the subtrees andit goes linear and becomes around n^2 in the worst case. Did ioverlook anything ?

Instead of heaps, a single sorted array or breaking them into aseries of sorted arrays ie skip lists kind of stuff would befine if it just want a offline Data structure ?

or is this some domain specific data Structure where you only/crewant max/min in some sequence ?

Re: An interesting data structure with search time O(sqrt n)

Reply via email to