>
> From: "Scott G. Miller" <scgmille at te-42-2.teter.indiana.edu>
> Subject: Re: [Freenet-dev] Searching freenet
>
> The only concern I have is with the number of calculations necessary for a
> node to determine the next hops. But you addressed that. I haven't run
> this through my noodle enough yet, but it doesn't 'smell' wrong.
I think searching will naturally entail more calculations ... there is nothing
we can do to avoid that. Trying to keep them to a minimum while offering a
useful search mechanism is the trick. You could also reduce the impact of
the calculation by limiting the number of keywords (seach engines do this
transparently on the web) and lowering the priority of the search request
relative to a data retrieval.
Unfortunately, I don't have the programming experience to implement my
proposal to see if it works as I envision.
> Are we all agreeing then that basically the search system is a separate,
> document independant keyspace, and that all we're trying to do is figure
> out how to arrange the keyspace and route requests in it?
>
> Scott
I'm not sure that anything has been agreed upon. Ian is putting together some
generic fuzzy operators to do search match comparisons with. I don't think
that these could be used for my proposed keyword closeness routing but it
will certainly be used for any kind of data matching to find the best data
match at each node.
I think a few things have to be straightened out though. Ian talks about
"plain-text" keys being searched against. I'm not sure exactly what he has in
mind here. It would be a good idea to nail down the format that searchable
data will take before trying find a method to look for it.
I would imagine that it will consist of a different type of data
entry than an ordinary message in the stores. Maybe one with only metadata
allowed that just refers to the CHK which it describes. But under what key
would it be stored? One of the keywords? A combination of all the keywords?
How about storing:
CHK.info
KWH.1=...
KWH.2=...
...
plain-text-keyword.1=...
...
Signature=...
Version=...
mime-type=...
Description="..."
{other metadata ...}
This is set up by the initial search index insert and routed via
the KWHs. When new inserts for that same CHK come in, new info is
appended to the CHK.info entry in the stores (i.e. new keywords).
Ian's fuzzy operators can work on the plain-text portions of the metadata
to match a boolean expression in the search index request
(contains (kw1 and kw2 not kw3) and (version > 4.2))
which is routed via KWH1 and KWH2 not KWH3. You can use the NOT
operator in the routing too to fine tune the key-space where it ends up.
(i.e. this would force the routing into area A rather than D in the diagram
at: http://members.home.com/mwiktowy/freenet.search.png
When the author comes up with a new version of something, they can just
insert a new CHK.info search index refering to the new file using the
same keywords. Another insert under the old CHK.info could refer to the
new update. Effective searching kind of takes care of updating
documents on the Freenet.
I think using hashed keywords on insert would be best for obscurity
reasons (you don't want people to be knowing what your inserting)
eventhough you can't fuzzy-match them (not meaningfully anyways).
But other plain-text keywords can be included that can be matched in
a fuzzy way.
Mike
_______________________________________________
Freenet-dev mailing list
Freenet-dev at lists.sourceforge.net
http://lists.sourceforge.net/mailman/listinfo/freenet-dev