On Tuesday, 1 December 2015 at 22:48:56 UTC, Andrei Alexandrescu wrote:
On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:
On 11/30/15 9:47 PM, Timon Gehr wrote:
On 12/01/2015 03:33 AM, Timon Gehr wrote:
On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:
So now consider my square heaps. We have O(n) build time (just a bunch
of heapifications) and O(sqrt n) search.

How do you build in O(n)? (The initial array is assumed to be completely
unordered, afaict.)

(I meant to say: There aren't any assumptions on the initial ordering of
the array elements.)

That's quite challenging. (My O(n) estimate was off the cuff and possibly wrong.) Creating the structure entails simultaneously solving the selection problem (find the k smallest elements) for several values
of k. I'll post here if I come up with something. -- Andrei

OK, I think I have an answer to this in the form of an efficient algorithm.

First off: sizes 1+3+5+7+... seem a great choice, I'll use that for the initial implementation (thanks Titus!).

Second: the whole max heap is a red herring - min heap is just as good, and in fact better. When doing the search just overshoot by one then go back one heap to the left and do the final linear search in there.

So the structure we're looking at is an array of adjacent min-heaps of sizes 1, 3, 5, etc. The heaps are ordered (the maximum of heap k is less than or equal to the minimum of heap k+1). Question is how do we build such an array of heaps in place starting from an unstructured array of size n.

One simple approach is to just sort the array in O(n log n). This satisfies all properties - all adjacent subsequences are obviously ordered, and any subsequence has the min heap property. As an engineering approach we may as well stop here - sorting is a widely studied and well implemented algorithm. However, we hope to get away with less work because we don't quite need full sorting.

Here's the intuition: the collection of heaps can be seen as one large heap that has a DAG structure (as opposed to a tree). In the DAG, the root of heap k+1 is the child of all leaves of heap k (see http://imgur.com/I366GYS which shows the DAG for the 1, 3, 7, and 7 heaps).

Clearly getting this structure to respect the heap property is all that's needed for everything to work - so we simply apply the classic heapify algorithm to it. It seems it can be applied almost unchanged - starting from the end, sift each element down the DAG.

This looks efficient and minimal; I doubt there's any redundant work. However, getting bounds for complexity of this will be tricky. Classic heapify is tricky, too - it seems to have complexity O(n log n) but in fact has complexity O(n) - see nice discussion at http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity. When applying heapify to the DAG, there's more restrictions and the paths are longer, so a sliver more than O(n) is expected.

Anyway, this looks ready for a preliminary implementation and some more serious calculations.

One more interesting thing: the heap heads are sorted, so when searching, the heap corresponding to the searched item can be found using binary search. That makes that part of the search essentially negligible - the lion's share will be the linear search on the last mile. In turn, that suggests that more heaps that are smaller would be a better choice. (At an extreme, if we have an array of heaps each proportional to log(n), then we get search time O(log n) even though the array is not entirely sorted.)


Andrei

Nice to see this interesting post and learn.

 I have a few questions.

1) This is offline datastructure since you don't know how the elements of the future are going to be ie dynamic. ie later elements from n to 2n can break or change your heaps as such in worst case or is it a dynamic data structure ?

2) Searching in min or max heaps is bad isn't it ? Lets say we choose max heaps. Now we have the root as 10^9 in the second last heap ie around n^2 elements. The children of it are 4*10^8 and 5*10^8 . If i'm searching for say 4.5 *10^8 my job is easy but if i'm searching for 1000, i have to search in both the subtrees and it goes linear and becomes around n^2 in the worst case. Did i overlook anything ?

Instead of heaps, a single sorted array or breaking them into a series of sorted arrays ie skip lists kind of stuff would be fine if it just want a offline Data structure ?

or is this some domain specific data Structure where you only/cre want max/min in some sequence ?




Reply via email to