Ref.: Routing Visualization Diagram
http://members.home.com/mwiktowy/rvd.description.html

Problem:
As previously argued by Hal and others, we want to avoid all search
index requests for a popular keyword being routed to the same focus when

such requests/inserts do not stop at the first match but carry on until
HTL runs out.

I would argue that this does not prevent us from having a search method
that does exhaust its HTL and allows keyword hash keys to be used
without insert collission detection of them.

It is all a matter of utilizing intelligent routing.

Solution:
Route search index inserts and search index requests via a combination
of keyword hash matches such that the focus is determined by the
combination of keywords rather than just one.

For example:
Define,
HKWn = hash of keyword n
Nn = Node n
C(x,y) = closeness of x to y

A node has a few references to other nodes: N1, N2, N3, N4 and N5 and it

receives an search index request or insert containing HKW1, HKW2, HKW3,
HKW4 and HKW5 that it has to route.

The node calculates the 25 permutations C({N1...5}, {HKW1...5}).
It calculates the five averages of the five closeness numbers for each
reference node. So the complete calculation would be:
avg1(C(N1,HKW1), C(N1,HKW2), C(N1,HKW3), C(N1,HKW4), C(N1,HKW5))
avg2(C(N2,HKW1), C(N2,HKW2), C(N2,HKW3), C(N2,HKW4), C(N2,HKW5))
...
avg5(....)

The node that has the highest average gets routed too.

Your method for finding the best matched data would be used. Tacked
onto the routed message would be a closeness number of the highest match

of the HKWs to the indexed data in that particular node and the hash of
that closest matching data would be kept in memory for easy access for
when the timedout message comes back carrying that best match number.

Maybe another number can be used to indicate how many hits the searcher
wants returned. If every node appends its best match on search
request, then the node that generates the timedout message can truncate
that best match list down to a number of matches that correspond to that

hit count. These matching hits can be appended to the return message as
it makes its way back to the originator.

The routing calculation can likely be streamlined by using minimum
closeness limits to prefilter the nodes to be used. As an example, if a
node turns out to be less close to a KWH than a predetermined limit
(that can be dynamically determined as an inverse relationship to the
CPU load on the node machine) then that node is dropped for subsequent
calculations). Other methods could be used to ensure a node who knows a
lot of other nodes can reduce the number of candidate nodes to do the
final calculation on.

Another way might be to place more emphasis on  KWH1 and find the n
closest nodes to it and then use those to narrow down the closest based
on all the KHWs.

This would cause popular keyword index documents to be spread to the
intersections of the key closeness regions rather than be routed to any
one KWH focus. This would prevent there being any single, small, well
defined region that houses any particular keyword index. No one node
will be swamped by inserts of index data under the KWH of "mp3" since
someone will more than likely insert their index under a collection of
KWHs ... and if they only do supply one then any node can just make up a

second or third KWH to blur the focus of single KWH inserts).

A simple 3 keyword region is illustrated in the diagram at:
http://members.home.com/mwiktowy/freenet.search.png

The focuses of each of the keywords are F1, F2 and F3.
The key-space area where a search index request or
insert for a combination of all three keywords will be routed
using this method is located at D. This area would attract those
key combinations since the average of three good closeness matches
will likely be better than the average of one or two close matches along

with a poor closeness match (as represented by areas A, B and C).

Comments?

Mike




_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

Reply via email to