> > I still don't see how the insert and search will find each other.
> > Think of a network with 100,000 nodes, and the data you want has been
> > inserted, but you are the first person fetching it. It lies on maybe
> > 10 nodes in the network. There are 10,000 to 1 odds against any given
> > node holding it. This means you need accurate routing if your search
> > is going to find it in 10 hopes.
>
> Of course you do! This would be inevitable with *any* search mechanism,
> you must be very specific to find rare data, but once some "pioneers"
> have stumbled upon the data a few times, then paths will form, and it
> will become common enough to be found easily. I think insertions of
> search data should also have higher HTLs for this reason.
The issue is not so much whether *any* search mechanism has undesirable
properties, it's whether this particular search mechanism will work.
It's not clear that a search mechanism with these properties will be
adequate to satisfy the things people want to do with searching.
For one thing, it appears that the more keywords that are used during
insert, the less likely the data is to be found. (You haven't said much
about how inserts would be routed, so I am not 100% certain about this.)
The reason is because searches would need to use substantially the same
keywords as the insert in order to route to the same place. If I insert
with 12 keywords and someone only searches for four of them, the search
is unlikely to find the data (the first time) because those other eight
keywords will influence where the insert went, and the search won't
know them.
But since we're not able to do text searches like on the web, having
useful and guessable keywords is going to be much more important.
It won't be easy to come up with just four or five keywords which will
both be relatively unique and also be the same as what someone would
guess who would want to read the document.
How would you envision something like a discussion board or Usenet working
with this system? One possibility would be to insert under two keywords,
a topic (like comp.protocols.freenet) and a subject ("why searching will
not work"). These have to be used in an "orthogonal" manner to direct
the insert and search; that is, we can't hash them both together into one
hash. But now we will find that all the comp.protocols.freenet inserts
may get grouped together since they will have considerable similarity.
So we have the bad effect that similar topic data gets grouped together.
If I add additional keywords, like date and author, then finding the
data will be more difficult. It might get inserted to nodes that have
other messages I wrote on that same date (which match on date and author)
rather than the ones holding the topic.
Maybe I am missing the point here. Is the goal not to make searches
that work (nearly) as well as on the web, but merely to make something
that makes it easier to find data than the current system, where we have
to guess the entire insert key? If so, then I agree that some fuzzy
matching can make this work better (at the cost of grouping data on
similar topics). But I still doubt that this is going to satisfy people.
Hal
_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev