Re: And here's another interesting algorithm/structure: Randomized Slide to Front

Marcelo Juchem via Digitalmars-d Mon, 30 Nov 2015 20:11:55 -0800

On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescuwrote:

[...]

One well-known search strategy is "Bring to front" (describedby Knuth in TAoCP). A BtF-organized linear data structure issearched with the classic linear algorithm. The difference iswhat happens after the search: whenever the search issuccessful, the found element is brought to the front of thestructure. If we're looking most often for a handful ofelements, in time these will be near the front of the searchedstructure.

[...]

Another idea is to just swap the found element with the onejust before it. The logic is, each successful find will shiftthe element closer to the front, in a bubble sort manner. Intime, the frequently searched elements will slowly creep towardthe front. The resulting performance is not appealing - youneed O(n) searches to bring a given element to the front, for atotal of O(n * n) steps spent in the n searches. Meh.
So let's improve on that: whenever an element is found inposition k, pick a random number i in the range 0, 1, 2, ..., kinclusive. Then swap the array elements at indexes i and k.This is the Randomized Slide to Front strategy.

[...]

Insertion and removal are both a sweet O(1), owing to the lightstructuring: to insert just append the element (and perhapsswap it in a random position of the array to prime searchingfor it). Removal by position simply swaps the last element intothe position to be removed and then reduces the size of thearray.

[...]

Andrei

It seems to me you're trying to implement the array basedequivalent of Splay Trees (Splay Array rhymes, btw). Would thatbe a close enough description?

I'm assuming you're trying to optimize for some distributionwhere a minority of the elements account for the majority ofqueries (say, Zipfian).

Here are some ideas that come to mind. I haven't thought throughthem too much so everyone's welcome to destroy me.

Rather than making index 0 always the front, use some rotatingtechnique similar to what ring buffers do.

Say we initially have elements ABCDE (front at 0) and we searchfor C. We swap the left of front (cycling back to the end of thearray, thus index 4) with the new front. We now have thefollowing array at hand: ABEDC, front at 4 (logically CABED).

Obviously we shouldn't change front if the queried element isalready it.

An immediate problem with this technique is that we'll frequentlypollute the front of the array with infrequent items. Say theseare the number of queries made so far for each element: A:7, B:5,C:2, all others 0. Also, assume that this is the state of thearray at this point: DEABC, front at 2. Say we now query for B.This is the new state: DBAEC, front at 1 (logically BAECD).Having E in front of C is undesirable, so we need a way to avoidthat.

From now on I'll refer to indexes as the logical index. That is,let i be (front + index) % size. For the sake of brevity, let dbe the distance between the element and the front = i - front.Let q be the number of successful queries performed so far.


What I have in mind boils down to decide between:

- move a newly queried element at logical position i to the leftof front (technique above). Let's call it move-pre-front for thelack of a better name;- bubble the element up to some position between [0, i), notnecessarily max(0, i - 1).

Augmenting the array with the number of queries for each elementwould tremendously help the decision making, but I'm assumingthat's undesirable for a few reasons like:- the array can't be transparently used in algorithms unaware ofthe structure;

- alignment;
- data bloating.

My immediate thought is to use some heuristic. For instance, saywe have some threshold k. If d <= k, we bubble up s <= dpositions to the left, where s could be computed using somedeterministic formula taking d, q and/or k into account, or justrandomly (Andrei's RStF). If d > k, we move-pre-front the element.

The threshold k could be computed as a factor of q. Say, sqrt(q),log q or log^2 q (logarithm base 2).


Thoughts?

Marcelo

Re: And here's another interesting algorithm/structure: Randomized Slide to Front

Reply via email to