On Dec 6, 2004, at 4:53 AM, Chris Fraschetti wrote:
My lucene implementation works great, its basically an index of many web crawls. The main thing my users complain about is say a search for "slashdot" will return the http://www.slashdot.org/soem_dir/somepage.asp as the top result because the factors i have scoring it determine it as so... but obviously in true search engine fashion.. i would like http://www.slashdot.org/ to be the very top result... i've added a boost to queries that match the hostname field, which helped a little, but obviously not a proper solution. Does anyone out there in the search engine world have a good schema for determining root websites and applying a huge boost to them in one fashion or another? mainly so it appears before any sub pages? (assuming the query is in reference to that site) ...
Consider applying the boost to the Document, rather than the field, at index time. I assume each document in your index represents one page. At indexing time you know whether it is a root page or not, right?
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]