>
>              FROM: finney.org
>              DATE: 04/19/2000 12:56:32
>              SUBJECT: RE:  [Freenet-dev] Proposal for the Near Future 
> (Searching, CHKs and encryption, ..., oh my!)
>
>              I want to reiterate a comment I made earlier, with regard to 
> storing
>              things into the Freenet under a "searchkey" like mp3.  This is 
> not going
>              to work, because too many documents will use that keyword, and 
> they will
>              all try to go onto that one node (even if the "documents" are 
> just index
>              or metadata entries there are too many).

I read your concerns before and can totally see where you are coming from. 
There certainly will be an increased load on IPs that the smart routing thinks 
should the the home for hashes of popular
keywords. There are other things to consider though. I don't know how the 
routing algorithm works exactly but it seems to me that it's focus can be 
adjusted. What I mean is the "best" IP for a
particular hash may not strictly be one single IP but rather a group of IPs. By 
adjusting the fuzziness of the targeting you might reduce the efficiency of the 
routing mechanism by a hop or two but the
load on the targeted server will be dropped by a lot more.

Also, remember that the inserts are caches along the way so a request will 
likely get filled (the HTL runs out) *long* before it reaches the focus of the 
insert. I suspect that this will improve as the
freenet gets bigger and bigger since each node will know a smaller and smaller 
proportion of the whole freenet. Right now, each node knows the vast majority 
of the freenet and the target focus is
reached quickly ... in fact there is a good chance that *the best* IP is 
already referenced in its store and on an insert the first hop is to it. Since 
it doesn't have itself in its reference, it sends
in out to it's best IP. That IP would likely try to send it back to *the best* 
IP but since it already has it it will send it to the second best ... etc., 
etc., etc.
Once the focus is reached, each node afterward will try to send it to the nodes 
that already have it ... if this is how the routing works then certainly a bit 
of fuzziness is in order _now_ simply for
normal operation. The system is undamped and in need some damping ... to put 
things in control system terms. This can be provided with a random scattering 
of the target match. For instance compairing
the insertion hash against a sequential random selection of 80% of the digits 
of each referenced IP in the routing process. This would make sure that 
individual networks are not swamped either (for
instance 123.56.78.* might have a really big affinity for the hash of the 
keyword mp3.

>
>              But if we do this, the primary keyword can`t be a common word 
> like
>              "mp3".  It would be OK as a secondary keyword but as a primary 
> it would
>              be too common.  You could search with primary keyword = 
> "backstreet boys"
>              and secondary = "mp3", and that wouldn`t have so many collisions.

Typically you would insert your data under keywords that are the most 
specifically descriptive. Also, you would use the most descriptive keyword 
first in your search also (as I remember you mentioning
before). One would be somewhat foolish to insert a "backstreet boys" 
(*shudder*) mp3 under the keywords "mp3" or "music". Rather you would find that 
your data would be more retrievable if it (or
references to it) was inserted under the keywords "backstreet boys" or the 
specific title of the song. I think the insertion and requests of common 
keywords will be self regulating in the fact that you
won't find anything of quality using them ... not that the backstreet boys are 
quality but to each their own :]

> <snip>

>
>              What I proposed was that nodes would simply refuse to accept 
> data if
>              they already have too many entries with the identical primary 
> searchkey
>              as the incoming.  So an attempt to insert an entry under 
> searchkey of
>              "mp3" would simply (and perhaps silently) fail since the node 
> would
>              already have too many such entries.  Using "backstreet boys/mp3" 
> would
>              be more likely to succeed but even that might be too much 
> crowding for
>              some nodes.  Using "backstreet boys/i want it that way/mp3" 
> would be
>              much less likely to collide.

I think limiting the number of identical KHKs in the store is a good idea and 
will also blur the focus of the routing a bit. But it may lead to the orbiting 
or undamped data insertions/requests like I
described above.

>
>              There would therefore be a disadvantage to using primary 
> searchkeys
>              that are common.  They would be unlikely to be retained on 
> insert,
>              and therefore unlikely to return all the possible hits on 
> retrieve.
>              Using more combinations of keywords as primary searchkeys would 
> make
>              the data more likely to be available on Freenet, but at the cost 
> of
>              requiring users to specify several keywords in order to find the 
> data.

When you use a web search engine to find information on a particular book do 
you start your search at "paper", "ink",  "book"? I don't think freenetizens 
will either. Nor will people index their data
under such vague keywords either unless they want their data to die. I think 
more specific keywords alone are a better bet than specific keywords strapped 
to general ones. There needs to be a way to
search for boolean combinations of keywords within one search attempt, however, 
and this can be achieved through the method that you gave earlier where you 
would include the rest of the more general
keywords inside the document metadata. That way you can do a search for 
"backstreet boys" AND the metadata containing the "content-type=book", you 
wouldn't get a flood of mp3 but rather a book about
the backstreet boys (*shudder*)

Mike


_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

Reply via email to