On Sun, Aug 17, 2003 at 07:03:24PM -0400, Scott Young wrote: > On Sun, 2003-08-17 at 18:24, Scott Young wrote: > > > I think it would be better if the client application calculated the > > > scoring. I assume that the 'weight' you mention here is somehow > > > calculated from where in the page the word was found and how 'valuable' > > > in the given page the word is and so on.. > > > > The "weight" could be calculated by any means the index-publisher wants, > > but it should generally indicate the relevance of a certain page to a > > certain keyword. > > > > > > > I would recommend that the index file contained this information instead > > > <Information Domain> <Relative position> <KEY> > > > > So basically you're saying more metadata should be stored on these index > > pages so that better queries can be done. I can see two ways that we > > can handle orthogonal metadata: include it with the data in a particular > > index, or include it in it's own separate index that uses the mechanism > > above. For example, if you want to have a song search engine, you could > > have an index for the name of the song, and another index for the > > artists. Orthoganal metadata like Genre and bitrate could be stored > > along with the entries instead of in their own indexes. If they were > > stored in their own indexes, then the page for "128 kbps" would be > > unacceptably HUGE. > > > > This is starting to look like a database. Databases need less storage > > space if they are normalized. With multiple indexes, pages could be > > stored like this: > > > > > > [EMAIL PROTECTED]/mySearch/keys/keys1 > > Sorry... my message got loose before it was done (Evolution is acting > up). > > > as I was saying, you could have a listing of every page that your search > system indexes, split across several files. For example > > [EMAIL PROTECTED]/mySearch/keys/keys1 > would contain > 1 "[EMAIL PROTECTED]/ItDontMeanAThing.mp3" "Jazz" "Ella Fitzgerald" > 2 "[EMAIL PROTECTED]/Help.mp3" "Rock" "The Beatles" > ... > > > The first number is just an index number. It is used so that any other > indexes only need to store that index number and its weight (which means > saved space when a certain file is indexed in multiple places.) Other > data that is on a 1-to-1 correspondence with the key can also be put > here. > > The "keys1" page would contain entries 1 through 100, "keys2" would > contain 101 through 200, etc. > > An index like this: > [EMAIL PROTECTED]/mySearch/indexes/artist > would contain pages that contain information about specific artists. > > For example: > [EMAIL PROTECTED]/mySearch/indexes/artist/Beatles > could list: > 2 10 > 15 10 > 30 10 > 34234 10 > 545 10 > where the first number refers to the index number of a song in the > "keys" directory, and the second number is the weight. When querying > the Beatles page, the search engine would then request pages 1, 5, and > 342 (which contain the keys for the songs listed in this particular > index).
I wouldn't. Compress it if you like, but freenet is lossy, remember? > > > There could also be an index for the song title. A user could say "Give > me every Beatles song named Help that is at least 160 kpbs." The search > engine could then come up with a query plan. The query plan would > probably be to search the artist index for "Beatles," then look in the > Song Title index for "Help," take the intersection, take the resulting > key index numbers and look them up in the key pages, and then filter > those the results on the bitrate. > > > > I guess what I'm getting at is more than just a search capability. It > could actually work as a database on freenet, albeit a high-latency > one. Searching would be the immediately obvious application of a > database system, but there might be other later uses. > > So the big question is, who wants to write an RDBMS over Freenet so > Freenet can get some really good searching capability? > > > > > > _______________________________________________ > devl mailing list > [EMAIL PROTECTED] > http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl -- Matthew J Toseland - [EMAIL PROTECTED] Freenet Project Official Codemonkey - http://freenetproject.org/ ICTHUS - Nothing is impossible. Our Boss says so.
pgp00000.pgp
Description: PGP signature
_______________________________________________ Devl mailing list [EMAIL PROTECTED] http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl
