On Thursday 10 Jul 2008, Navjot Kukreja wrote: > HI everyone > I saw that there are a lot of open source site-search scripts > available on the web. I'm wondering how exactly they do their job. > Apart from indexing every single word in every single page, how do > they sort the 'keywords' according to relevance to each page so that > the results are more accurate. Or, in other words, when i search for > a particular word, it probably lists all pages that contain the word. > What i want to know is what is the algorithm determining the order of > results. The question still might seem too general. > I want a script to parse a certain collection of pages and create > keyword to page relationships giving weight to each relation, which > represents how relevant that keyword is to that particular page. > Does such a script exist? Because I can't seem to find such a thing. > If it doesn't, I can't even think of a suitable way to implement > this. Can someone shed light here please?
Googling for 'keyword document rank' results in some decent links. If you want to look at a specific implementation, check out the PostgreSQL text search facility. Section 12.3.3 at http://www.postgresql.org/docs/8.3/static/textsearch-controls.html has some pointers, and the source may be worth a read too. Regards, -- Raju -- Raj Mathur [EMAIL PROTECTED] http://kandalaya.org/ GPG: 78D4 FC67 367F 40E2 0DD5 0FEF C968 D0EF CC68 D17F PsyTrance & Chill: http://schizoid.in/ || It is the mind that moves _______________________________________________ ilugd mailinglist -- ilugd@lists.linux-delhi.org http://frodo.hserus.net/mailman/listinfo/ilugd Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi http://www.mail-archive.com/ilugd@lists.linux-delhi.org/