Hi Bart,
thanks for your response. I found it very interesting.
I created 2 plugins, first runs of the index part and the second on search.
Instead of boosting only one particular field, I boost the whole document. And
I get good results.
Could somebody explain little bit in more details about Solr function
org.apache.solr.search.function.FunctionQuery?
Thanks.
Milan Krendzelak
Senior Software Developer
dotMobi (mTLD Top Level Domain, Ltd.)
11 Exchange Place, IFSC, Dublin 1, Ireland
Phone: + 353.1.854.1100 Fax: +353.1.791.8569
________________________________
From: Robeyns Bart [mailto:[EMAIL PROTECTED]
Sent: Fri 22/06/2007 17:56
To: [EMAIL PROTECTED]
Subject: RE: How to score a paticular page higher than the other pages
I think you might also want to look at the ScoringFilter-extension point in
Nutch. ScoringFilters define the 'boost-factor' for each indexed document; the
boost-factor is used by Lucene to boost the scoring for a document in any given
query.
Nutch by default applies its 'OPICScoringFilter', which defines the boost as a
function of the inlinks to a document.
You can write your own ScoringFilter that either manipulates the OPIC-defined
boost (by chaining it after OPICScoringFilter) or simply defines it's own. I'm
not sure, but it looks like you might get a hold of both the document's url and
its content in the 'passScoreBeforeParsing'-method, which gets called at
fetch-time.
One way to to implement your own filter for your purpose:
Check whether the url is a forum-thread, parse the content with a validating
html-parser and derive a score from it; multiply the obtained score with the
one that gets passed in (that's the OPIC-score) and return the result.
Make sure to define the order of scoringfilters to be applied with the
scoring.filter.order in nutch-site.xml:
<property>
<name>scoring.filter.order</name>
<value>org.apache.nutch.scoring.opic.OPICScoringFilter <yourfilter></value>
</property>
Hope this helps,
Bart
-----Original Message-----
From: Milan Krendzelak [mailto:[EMAIL PROTECTED]
Sent: Fri 6/22/2007 18:21
To: [EMAIL PROTECTED]
Subject: RE: How to score a paticular page higher than the other pages
Hi Ann, I am really would be interested in custom boosting of the documents by
some external factor.
for example, external factor is the score of the web page validation according
to the W3C...
in this case, I would like to display search result not only according to the
relevancy but also according the validation score.... And I don't want to sort,
but make influence on the doc boost by external factor.
I will appreciate any help or suggestions about this.
Thanks.
Milan Krendzelak
Senior Software Developer
dotMobi (mTLD Top Level Domain, Ltd.)
11 Exchange Place, IFSC, Dublin 1, Ireland
Phone: + 353.1.854.1100 Fax: +353.1.791.8569
________________________________
From: Annona Keene [mailto:[EMAIL PROTECTED]
Sent: Fri 22/06/2007 17:06
To: [EMAIL PROTECTED]
Subject: Re: How to score a paticular page higher than the other pages
Hi Harmesh,
I did something similar to this, and I can offer a few suggestions. I'm not
sure any of these is the *right* answer, but I've found it to be effective for
my purposes.
Have a field that you boost significantly for the page(s) you'd like higher in
the results. We have something like a keywords field where we put terms that
should bring up a page very high in the results. It's worked quite well.
Another thing we've done is use the FunctionQuery from solr. (Though I believe
this might be part of Lucene 2.2.0 proper now. I'm not certain.) The API is
here:
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
This allows you to assign some numerical value in a field for your site, then
the score is influenced by the value. So in our case, we set high values for
the "import" pieces of the site and smaller values for less "important" pieces.
If any of this was unclear, or I didn't actually answer your question, please
let me know. I've done a lot of "hacking" to influence the result rankings.
Have a great day,
Ann
----- Original Message ----
From: "Harmesh, V2solutions" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, June 21, 2007 5:06:00 AM
Subject: How to score a paticular page higher than the other pages
Hi ,
Can any one help me out that how to score a paticular page higher than the
other,
For eg.
In my case i crawl through foums, where threads are more important
than the other pages.
I want the links like (http://forum.ottawagolf.com/showthread.php) on the
top of my result instead of
i am getting links like this
(http://forums.roadbikereview.com/forumdisplay.php), which i would perfer at
lower
priority..
Thanks in advance.
--
View this message in context:
http://www.nabble.com/How-to-score-a-paticular-page-higher-than-the-other-pages-tf3957718.html#a11230107
Sent from the Nutch - User mailing list archive at Nabble.com.
___________________________________________________________________________________
You snooze, you lose. Get messages ASAP with AutoCheck
in the all-new Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/newmail_html.html
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general