On Sat, 2003-08-16 at 21:23, Jeremy Caleb Heffner wrote:
> I wasn't really referring to inserting the index as a typical key.  The
> reason why not is because I think this idea is based upon combining indexes
> so that they slowly grow and 'centralize' in a way, which I don't think is a
> good idea in a distributed system like this.
> 
> What I was referring to was instituting a new Search message type in the
> Freenet protocol (think somewhat like how Gnutella does this, but our own
> version of course).
> 
> The way this message would work is very similar to the way a chain is formed
> to pass key data back to the requester, which preserves the anonymity of the
> searcher.  Caching the search results along the chain also protects the
> privacy of the local index because it would contain both locally indexed
> data and indexes it relayed.  Each time a keyword was searched for its
> results would be distributed to more nodes, expanding the ability to search
> for that keyword and lessen the bandwidth consumed for X number of results.
> 
> Would this work? (I am not claming to be an expert of any kind, just
> throwing out an idea for a searching system that doesn't rely on
> conglomerating indexes.)
> 
> Jeremy



You should look up FASD.  FASD is a search mechanism that someone
designed a few years ago so that searching could be done in freenet, but
the design has laid dormant for a while.  It used a cosine correlation
function to determine "closeness" to certain metadata.

The problem I see with a metadata-key system is that it suffers the same
problem as the META tags search engines used to index sites.  How are
you going to prove that the indexes are honest and correct?  FASD wanted
to make the metadata used for querying decentralized from an insertion
standpoint.  That is, publishers were responsible for inserting metadata
into freenet.  This means that you have to trust the metadata keys that
were inserted.  FASD does have a culling mechanism so that metadata
could be validated and deleted, but this system seems like it would be
expensive to execute on a large scale network.


The idea I have for a search function is to have different search engine
sites in freenet.  The search engine maintainers would gather data from
freenet by spidering/hand-picking/doing whatever they feel like to
generate this index.  When a user goes to this site, a submit form sends
a command to a client program (probably integrated with FProxy) to
execute a search across the indexes.

Indexes are stored in the following format:

[EMAIL PROTECTED]

where Keyword would be a listing of certain sites that would correlate
to that keyword, along with "weights" for each site.  A large search
index would have many of these keyword pages. So a search for "movies"
would look for

[EMAIL PROTECTED]

which might contain the following:

17 [EMAIL PROTECTED]
15 [EMAIL PROTECTED]
7 [EMAIL PROTECTED]
5 [EMAIL PROTECTED]
3 [EMAIL PROTECTED]

and the search result page returned through FProxy would contain those
pages in that order.  It is up to the author of the search engine to
determine how to weight the keys.  A search engine author could do a
Google-like PageRank that weights a site based on links between sites
after spidering through Freenet.  On the other hand, an author could
generate a searchable directory site that stores specific content (for
example, a site containing all of the books from Project Gutenburg). 
Yet a third search-engine author could take several of these indexes
from different sites to create a bigger, better index.  A filesharing
app running on freenet could also be designed to publish its own indexes
and combine it with others.  So the general idea here is to provide a
search system to fit everyone's needs.


Multiple-keyword searches could be done by requesting the index page for
all of those keywords, and take the intersection of all of those
indexes.  The weights for a specific site across all of those pages
could be combined by some simple formula (such as addition).


Thoughts?


Scott Young

_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to