Ian Clarke <I.Clarke at dynamicblue.com> wrote:
> Rather than simple moving data straight to the top of the datastore when
> it is re-requested, it is moved up the queue an ammount inversely
> proportional to its size.  More accurately, if the datastore is of size
> n (and here I refer to the portion of the datastore containing actual
> data, not just references) a piece of data of size p should be moved up
> by nq/p where q is the size of the smallest piece of data.  If doing
> this results in the data being moved up beyond the top of the queue then
> it is placed at the top of the queue.  This means that the smallest
> piece of data (unless I have fscked up the math) will always be moved
> straight to the top of the datastore, but other data will have to work
> harder in proportion to its size (so a piece of data twice the size of
> the smallest piece of data will have to get 2 hits to get to the top of
> the queue if it starts at the bottom).

I don't think items should zip to the top of the datastore that quickly,
though -- rather, the ordering in the datastore should reflect the
cumulative history of requests.  In the ideal world, items would be kept
always sorted by score = requests/size.  (So a set of 1k items would be
ranked by # of requests, while a 2k item with 20 hits is ranked the same as
a 1k item with 10 hits, since it takes up the space that could otherwise be
used to satisfy 20 hits divided among two 1k items).

If we don't want to keep actual records of the number of times each
item has been requested, we should use their position in the list to
approximate their past popularity.  The problem is that an unpopular
(small) item at the back of the list can erase its whole history of
unpopularity by zipping to the top with a single hit, passing many
more popular items, and then take a long time to sink back to its true
position.  Instead, items should gradually work their way up the list.
Also, newly-inserted items would be in time order, not size order.

Here's a modification: when an item is requested, it moves up by x
places, where x = 1 + log2(r/p), and p is the size of the item and r
is the size of the -largest- item in the datastore.  So every item
moves up at least one place when hit, but smaller items move up more
places.

When an item is inserted, it gets an automatic one hit.  This means
that all the items which have only been inserted but never requested
stay at the back of the queue, sorted more or less by size, while
items which are actually being used stay ahead of them.

theo

-- 
PGP: http://www.doc.ic.ac.uk/~twh1/ 
D5E5 0237 0592 CAF6 E4C4 967F 457E 9583 6AB4 876B


_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

Reply via email to