"Robeyns Bart" <[EMAIL PROTECTED]> wrote:

> I think you might also want to look at the ScoringFilter-extension point in 
> Nutch. ScoringFilters define the 'boost-factor' for each indexed document; 
> the boost-factor is used by Lucene to boost the scoring for a document in any 
> given query.
> Nutch by default applies its 'OPICScoringFilter', which defines the boost as 
> a function of the inlinks to a document.
> 
> You can write your own ScoringFilter that either manipulates the OPIC-defined 
> boost (by chaining it after OPICScoringFilter) or simply defines it's own. 
> I'm not sure, but it looks like you might get a hold of both the document's 
> url and its content in the 'passScoreBeforeParsing'-method, which gets called 
> at fetch-time. 
> 
> One way to to implement your own filter for your purpose:
> Check whether the url is a forum-thread, parse the content with a validating 
> html-parser and derive a score from it; multiply the obtained score with the 
> one that gets passed in (that's the OPIC-score) and return the result.
> 
> Make sure to define the order of scoringfilters to be applied with the 
> scoring.filter.order in nutch-site.xml:
> <property>
>   <name>scoring.filter.order</name>
>   <value>org.apache.nutch.scoring.opic.OPICScoringFilter <yourfilter></value>
> </property>
> 
> Hope this helps,
> 
> Bart
> 
> 
> -----Original Message-----
> From: Milan Krendzelak [mailto:[EMAIL PROTECTED]
> Sent: Fri 6/22/2007 18:21
> To: [EMAIL PROTECTED]
> Subject: RE: How to score a paticular page higher than the other pages
>  
> Hi Ann, I am really would be interested in custom boosting of the documents 
> by some external factor.
> for example, external factor is the score of the web page validation 
> according to the W3C...
> in this case, I would like to display search result not only according to the 
> relevancy but also according the validation score.... And I don't want to 
> sort, but make influence on the doc boost by external factor.
>  
> I will appreciate any help or suggestions about this.
> Thanks.
>  
> Milan Krendzelak
> 
> Senior Software Developer
> dotMobi (mTLD Top Level Domain, Ltd.)
> 11 Exchange Place, IFSC, Dublin 1, Ireland
> Phone: + 353.1.854.1100 Fax: +353.1.791.8569
> 
> ________________________________
> 
> From: Annona Keene [mailto:[EMAIL PROTECTED]
> Sent: Fri 22/06/2007 17:06
> To: [EMAIL PROTECTED]
> Subject: Re: How to score a paticular page higher than the other pages
> 
> 
> 
> Hi Harmesh,
> 
> I did something similar to this, and I can offer a few suggestions. I'm not 
> sure any of these is the *right* answer, but I've found it to be effective 
> for my purposes.
> 
> Have a field that you boost significantly for the page(s) you'd like higher 
> in the results. We have something like a keywords field where we put terms 
> that should bring up a page very high in the results. It's worked quite well.
> 
> Another thing we've done is use the FunctionQuery from solr. (Though I 
> believe this might be part of Lucene 2.2.0 proper now. I'm not certain.)  The 
> API is here: 
> http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
> This allows you to assign some numerical value in a field for your site, then 
> the score is influenced by the value. So in our case, we set high values for 
> the "import" pieces of the site and smaller values for less "important" 
> pieces.
> 
> If any of this was unclear, or I didn't actually answer your question, please 
> let me know. I've done a lot of "hacking" to influence the result rankings.
> 
> Have a great day,
> Ann
> 
> ----- Original Message ----
> From: "Harmesh, V2solutions" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, June 21, 2007 5:06:00 AM
> Subject: How to score a paticular page higher than the other pages
> 
> 
> Hi ,
>   Can any one help me out that how to score a paticular page higher than the
> other,
> For eg.
> 
>          In my case i crawl through foums, where threads are more important
> than the other pages.
> I want the links like (http://forum.ottawagolf.com/showthread.php) on the
> top of my result instead of
> i am getting links like this
> (http://forums.roadbikereview.com/forumdisplay.php), which i would perfer at
> lower
> priority..
> Thanks in advance.
> 
> --
> View this message in context: 
> http://www.nabble.com/How-to-score-a-paticular-page-higher-than-the-other-pages-tf3957718.html#a11230107
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> 
> 
> 
>       
> ___________________________________________________________________________________
> You snooze, you lose. Get messages ASAP with AutoCheck
> in the all-new Yahoo! Mail Beta.
> http://advision.webevents.yahoo.com/mailbeta/newmail_html.html 
> 
> 

We have done custom scoring plugin which modyfies boost of documents and it 
works as you described and it works great.

-- 
Damian Florczyk aka thunder
Gentoo Developer, Gentoo/NetBSD Development Lead

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to