I want to reiterate a comment I made earlier, with regard to storing
things into the Freenet under a "searchkey" like mp3.  This is not going
to work, because too many documents will use that keyword, and they will
all try to go onto that one node (even if the "documents" are just index
or metadata entries there are too many).

I like the idea of storing documents under keywords, and doing
"searches" by specifying a primary keyword (that is used for routing),
with secondary keywords which must also be matched in the node before
the doc is returned.  I also like the idea of returning multiple hits
in the form of index/metadata records which then point at a CHK or other
unambiguous specifier to pull the actual data.

But if we do this, the primary keyword can't be a common word like
"mp3".  It would be OK as a secondary keyword but as a primary it would
be too common.  You could search with primary keyword = "backstreet boys"
and secondary = "mp3", and that wouldn't have so many collisions.

It would still be possible that storing an index entry for every single
document on the net that is about the backstreet boys would overload
the node.  I therefore proposed that it would be useful to have the
primary key (the searchkey for routing) be a combination of keywords in
some canonical ordering (like alphabetical, separated by slashes; or
whatever other convention we adopt).  So the primary keyword could be
"backstreet boys/mp3".  This would cause even less congestion onto a
small number of nodes, and would only find mp3s by the backstreet boys.

What I proposed was that nodes would simply refuse to accept data if
they already have too many entries with the identical primary searchkey
as the incoming.  So an attempt to insert an entry under searchkey of
"mp3" would simply (and perhaps silently) fail since the node would
already have too many such entries.  Using "backstreet boys/mp3" would
be more likely to succeed but even that might be too much crowding for
some nodes.  Using "backstreet boys/i want it that way/mp3" would be
much less likely to collide.

There would therefore be a disadvantage to using primary searchkeys
that are common.  They would be unlikely to be retained on insert,
and therefore unlikely to return all the possible hits on retrieve.
Using more combinations of keywords as primary searchkeys would make
the data more likely to be available on Freenet, but at the cost of
requiring users to specify several keywords in order to find the data.

Hal

_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev

Reply via email to