> I think it would be better if the client application calculated the
> scoring. I assume that the 'weight' you mention here is somehow
> calculated from where in the page the word was found and how 'valuable'
> in the given page the word is and so on..

The "weight" could be calculated by any means the index-publisher wants,
but it should generally indicate the relevance of a certain page to a
certain keyword.


> I would recommend that the index file contained this information instead
> <Information Domain> <Relative position> <KEY>

So basically you're saying more metadata should be stored on these index
pages so that better queries can be done.  I can see two ways that we
can handle orthogonal metadata: include it with the data in a particular
index, or include it in it's own separate index that uses the mechanism
above.  For example, if you want to have a song search engine, you could
have an index for the name of the song, and another index for the
artists.  Orthoganal metadata like Genre and bitrate could be stored
along with the entries instead of in their own indexes.  If they were 
stored in their own indexes, then the page for "128 kbps" would be
unacceptably HUGE.

This is starting to look like a database.  Databases need less storage
space if they are normalized.  With multiple indexes, pages could be
stored like this:


[EMAIL PROTECTED]/mySearch/keys/keys1

_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to