Hi Bart,
thanks for your response. I found it very interesting.
I created 2 plugins, first runs of the index part and the second on search.
Instead of boosting only one particular field, I boost the whole document. And 
I get good results.
Could somebody explain little bit in more details about Solr function
org.apache.solr.search.function.FunctionQuery?
 
Thanks.
 
Milan Krendzelak

Senior Software Developer
dotMobi (mTLD Top Level Domain, Ltd.)
11 Exchange Place, IFSC, Dublin 1, Ireland
Phone: + 353.1.854.1100 Fax: +353.1.791.8569

________________________________

From: Robeyns Bart [mailto:[EMAIL PROTECTED]
Sent: Fri 22/06/2007 17:56
To: [EMAIL PROTECTED]
Subject: RE: How to score a paticular page higher than the other pages



I think you might also want to look at the ScoringFilter-extension point in 
Nutch. ScoringFilters define the 'boost-factor' for each indexed document; the 
boost-factor is used by Lucene to boost the scoring for a document in any given 
query.
Nutch by default applies its 'OPICScoringFilter', which defines the boost as a 
function of the inlinks to a document.

You can write your own ScoringFilter that either manipulates the OPIC-defined 
boost (by chaining it after OPICScoringFilter) or simply defines it's own. I'm 
not sure, but it looks like you might get a hold of both the document's url and 
its content in the 'passScoreBeforeParsing'-method, which gets called at 
fetch-time.

One way to to implement your own filter for your purpose:
Check whether the url is a forum-thread, parse the content with a validating 
html-parser and derive a score from it; multiply the obtained score with the 
one that gets passed in (that's the OPIC-score) and return the result.

Make sure to define the order of scoringfilters to be applied with the 
scoring.filter.order in nutch-site.xml:
<property>
  <name>scoring.filter.order</name>
  <value>org.apache.nutch.scoring.opic.OPICScoringFilter <yourfilter></value>
</property>

Hope this helps,

Bart


-----Original Message-----
From: Milan Krendzelak [mailto:[EMAIL PROTECTED]
Sent: Fri 6/22/2007 18:21
To: [EMAIL PROTECTED]
Subject: RE: How to score a paticular page higher than the other pages

Hi Ann, I am really would be interested in custom boosting of the documents by 
some external factor.
for example, external factor is the score of the web page validation according 
to the W3C...
in this case, I would like to display search result not only according to the 
relevancy but also according the validation score.... And I don't want to sort, 
but make influence on the doc boost by external factor.

I will appreciate any help or suggestions about this.
Thanks.

Milan Krendzelak

Senior Software Developer
dotMobi (mTLD Top Level Domain, Ltd.)
11 Exchange Place, IFSC, Dublin 1, Ireland
Phone: + 353.1.854.1100 Fax: +353.1.791.8569

________________________________

From: Annona Keene [mailto:[EMAIL PROTECTED]
Sent: Fri 22/06/2007 17:06
To: [EMAIL PROTECTED]
Subject: Re: How to score a paticular page higher than the other pages



Hi Harmesh,

I did something similar to this, and I can offer a few suggestions. I'm not 
sure any of these is the *right* answer, but I've found it to be effective for 
my purposes.

Have a field that you boost significantly for the page(s) you'd like higher in 
the results. We have something like a keywords field where we put terms that 
should bring up a page very high in the results. It's worked quite well.

Another thing we've done is use the FunctionQuery from solr. (Though I believe 
this might be part of Lucene 2.2.0 proper now. I'm not certain.)  The API is 
here: 
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
This allows you to assign some numerical value in a field for your site, then 
the score is influenced by the value. So in our case, we set high values for 
the "import" pieces of the site and smaller values for less "important" pieces.

If any of this was unclear, or I didn't actually answer your question, please 
let me know. I've done a lot of "hacking" to influence the result rankings.

Have a great day,
Ann

----- Original Message ----
From: "Harmesh, V2solutions" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, June 21, 2007 5:06:00 AM
Subject: How to score a paticular page higher than the other pages


Hi ,
  Can any one help me out that how to score a paticular page higher than the
other,
For eg.

         In my case i crawl through foums, where threads are more important
than the other pages.
I want the links like (http://forum.ottawagolf.com/showthread.php) on the
top of my result instead of
i am getting links like this
(http://forums.roadbikereview.com/forumdisplay.php), which i would perfer at
lower
priority..
Thanks in advance.

--
View this message in context: 
http://www.nabble.com/How-to-score-a-paticular-page-higher-than-the-other-pages-tf3957718.html#a11230107
Sent from the Nutch - User mailing list archive at Nabble.com.








      
___________________________________________________________________________________
You snooze, you lose. Get messages ASAP with AutoCheck
in the all-new Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_html.html




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to