The database is a merge of about 5 DBs, and contains around 20K
documents. The only relation about the documents is that they
are hosted in the same region.

I have rebuilt the database, (merged another DB), and checked
the merging log, but did not find any errors.

Now the search for "buskett" does only include a very small
ammount of irrelevant pages. But when I searched for "david
mifsud", I got 7 out of 10 irrelevant pages(1-4, 6-8)

The search algorithm I'm using es exact:1

BTW, by irrelevant I mean, loading the page, doing a search
for both the words david and mifsud, and not finding any of
the words in the source!

http://alpha.CompuCreations.com/search/

regards,

dave


* From [EMAIL PROTECTED]  Sat Nov 27 21:58:28 1999
* To: Dave <[EMAIL PROTECTED]>
* Subject: Re: [htdig] irrelevant pages in search
* Cc: [EMAIL PROTECTED]
* 
* At 10:54 AM +0100 11/18/99, Dave wrote:
* >Try it out at:
* >    http://alpha.CompuCreations.com/search/
* >
* >Words I have tried include "buskett" (results 2/3/6/10 are
* >irrelevant, i.e. 40% from the 1st page!)
* 
* I tried it out when you first sent the message and again now--I see 
* that a few of the results are irrelevant, but I'm not so sure all of 
* those you mention are irrelevant. At the least, I can see why they're 
* being flagged.
* 
* You don't mention how many pages you have in your database or how 
* closely related they are. Offhand, I think some of your "irrelevant" 
* pages are scoring highly because they have a high backlink weight. 
* You might try lowering the backlink_factor 
* <http://www.htdig.org/attrs.html#backlink_factor>
* 
* This factor weights "importance" of pages, essentially as a ratio 
* between the number of links pointing to a page divided by the number 
* of links on the page. (The ratio helps to remove "link farms" which 
* often have many links to them.)
* 
* Hope that helps,
* 
* -Geoff Hutchison
* Williams Students Online
* http://wso.williams.edu/
* 

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to