On Thursday 10 Jul 2008, Navjot Kukreja wrote:
> HI everyone
> I saw that there are a lot of open source site-search scripts
> available on the web. I'm wondering how exactly they do their job.
> Apart from indexing every single word in every single page, how do
> they sort the 'keywords' according to relevance to each page so that
> the results are more accurate. Or, in other words, when i search for
> a particular word, it probably lists all pages that contain the word.
> What i want to know is what is the algorithm determining the order of
> results. The question still might seem too general.
> I want a script to parse a certain collection of pages and create
> keyword to page relationships giving weight to each relation, which
> represents how relevant that keyword is to that particular page.
> Does such a script exist? Because I can't seem to find such a thing.
> If it doesn't, I can't even think of a suitable way to implement
> this. Can someone shed light here please?

Googling for 'keyword document rank' results in some decent links.

If you want to look at a specific implementation, check out the 
PostgreSQL text search facility.  Section 12.3.3 at 
http://www.postgresql.org/docs/8.3/static/textsearch-controls.html has 
some pointers, and the source may be worth a read too.

Regards,

-- Raju
-- 
Raj Mathur                [EMAIL PROTECTED]      http://kandalaya.org/
       GPG: 78D4 FC67 367F 40E2 0DD5  0FEF C968 D0EF CC68 D17F
PsyTrance & Chill: http://schizoid.in/   ||   It is the mind that moves

_______________________________________________
ilugd mailinglist -- ilugd@lists.linux-delhi.org
http://frodo.hserus.net/mailman/listinfo/ilugd
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi 
http://www.mail-archive.com/ilugd@lists.linux-delhi.org/

Reply via email to