awlydick wrote: [Searching for metadata]
> Also, we only need to generate three different searches. One for each > keyword. As some wise soul suggested earlier, we really need to > include > the other keywords that are being searched for within each of the > broken > up searches. But only route the searches with a single keyword. > We can include the other keywords in the search request, but if the nodes subsequently don't find enough matches then we should also return partial matches. Partial matches have both: a) matched one or more metadata criteria, and b) have the potential for matching the rest However, if we return metadata as part of the search result, we should *only* store the data that is relevant to the routing path we have chosen (which could have been specified in the search request by a special tag). Otherwise we reduce the potential the nodes have for clustering of data, by wasting disk-space on data that shouldn't be routed there anyway. This way, the user gets all the data, but clustering efficiency is still high. > So. We use break a search into multiple searches (one per keyword) and > route them as we would normal Freenet Keys. They are smart routed > until > their HTL expires, and are not executed in parallel. The user stops > sending In addition to HTL expiring, we can specify maximum_matches, and can expire requests when this has been reached/exceeded. We decrease maximum_requests by the number of results we matched. > Well. There you have it. Long as hell. Boring to read. Fun to write though :-) I think a definite paper on metadata and searching could be longer than Ian's paper. > I think that it could work, and I don't see anything glaringly wrong with it > given the debate that I've read so far. But poke lots of holes in it and i'll > try and scramble to fix it. Have fun. > I am 100% confident that metadata-searching and clustering works. Freenet should scale beautifully, unlike other search mechanisms like web spiders, which cannot hope to keep up with the growth in data creation. What you describe is very similar to the method I am trying to write up, There is a lot to consider for the best way to match keys, for there are two reasons for doing so. We want to route requests to the best node to enable clustering, and we also want to match keys in a human-sensible sort of way. e.g. "The Bible" and "Bible" could be considered close, since we need only four deletions/insertions to move between them. We could also know that "The " is pretty meaningless when it comes to searches. And imagine the case when someone receives 100 matches, but none are suitable. We should be able to start another request, but specify an 'ignore' list, which is a list of CHKs we are not interested in. So we can successively get an extra 100 matches at a time. Keep brainstorming guys :) _______________________________________________ Freenet-dev mailing list Freenet-dev at lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/freenet-dev
