On 11/30/15 3:20 PM, H. S. Teoh via Digitalmars-d wrote:
On Mon, Nov 30, 2015 at 03:13:11PM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
Okasaki's book is a continued inspiration of data structures and
algorithms.
I was thinking along the following lines: typical collections are
searchable in linear time. Then we have highly structured collections
that feature logarithmic search time. But there seems to be nothing in
between. So I was thinking, what would be a data structure that allows
O(sqrt n) search times?
After a little tinkering, here's what I came up with.
Interesting indeed.
It leaves me wondering, though, what's the point of having an O(sqrt n)
collection? What are the advantages? Why would I use this structure
instead of, say, a traditional array heap with O(log n) search time?
(Heaps offer only linear search time. You may take advantage of the heap
structure to skip portions of the array, but on average and in the worst
case the search is still O(n). So I assume you meant "sorted array or
one of the classic search trees".)
The motivation starts with a desire to use arrays as the fundamental
layout. There have been many trends in computing as of late, among
which: cache hierarchies are here to stay and contiguous layout is
preferable.
The short of it is, arrays are king. No two ways about it - following
pointers is a losing strategy when there's an alternative. A variety of
new data structures (Clojure's arrays, heaps with tombstones) avoid
classic pointer-based data structures in favor of adding structure on
top of arrays.
So now if we consider thinking, "how do we organize an array for good
search performance" we have a spectrum. Generally we also care about
insertion and removal.
At one end of the spectrum there's doing nothing at all - that means
O(1) build (nothing to do), O(n) search, O(1) insert (just append it),
O(n) removal. Not very nice.
At the other end, the absolute best structuring on top of an array for
search is sorting. With sorting you get great O(log n) search
performance. But the others are not nice - O(n log n) build, O(n) add,
O(n) remove.
So now consider my square heaps. We have O(n) build time (just a bunch
of heapifications) and O(sqrt n) search. Then (again I haven't worked
out the math yet) let's assume insertion and removal are both O(sqrt n).
Then you get something less structured than full sorting, but also just
good enough to offer the same complexity for each of search, insert, and
delete. That would be pretty neat.
Andrei