"Robeyns Bart" <[EMAIL PROTECTED]> wrote: > I think you might also want to look at the ScoringFilter-extension point in > Nutch. ScoringFilters define the 'boost-factor' for each indexed document; > the boost-factor is used by Lucene to boost the scoring for a document in any > given query. > Nutch by default applies its 'OPICScoringFilter', which defines the boost as > a function of the inlinks to a document. > > You can write your own ScoringFilter that either manipulates the OPIC-defined > boost (by chaining it after OPICScoringFilter) or simply defines it's own. > I'm not sure, but it looks like you might get a hold of both the document's > url and its content in the 'passScoreBeforeParsing'-method, which gets called > at fetch-time. > > One way to to implement your own filter for your purpose: > Check whether the url is a forum-thread, parse the content with a validating > html-parser and derive a score from it; multiply the obtained score with the > one that gets passed in (that's the OPIC-score) and return the result. > > Make sure to define the order of scoringfilters to be applied with the > scoring.filter.order in nutch-site.xml: > <property> > <name>scoring.filter.order</name> > <value>org.apache.nutch.scoring.opic.OPICScoringFilter <yourfilter></value> > </property> > > Hope this helps, > > Bart > > > -----Original Message----- > From: Milan Krendzelak [mailto:[EMAIL PROTECTED] > Sent: Fri 6/22/2007 18:21 > To: [EMAIL PROTECTED] > Subject: RE: How to score a paticular page higher than the other pages > > Hi Ann, I am really would be interested in custom boosting of the documents > by some external factor. > for example, external factor is the score of the web page validation > according to the W3C... > in this case, I would like to display search result not only according to the > relevancy but also according the validation score.... And I don't want to > sort, but make influence on the doc boost by external factor. > > I will appreciate any help or suggestions about this. > Thanks. > > Milan Krendzelak > > Senior Software Developer > dotMobi (mTLD Top Level Domain, Ltd.) > 11 Exchange Place, IFSC, Dublin 1, Ireland > Phone: + 353.1.854.1100 Fax: +353.1.791.8569 > > ________________________________ > > From: Annona Keene [mailto:[EMAIL PROTECTED] > Sent: Fri 22/06/2007 17:06 > To: [EMAIL PROTECTED] > Subject: Re: How to score a paticular page higher than the other pages > > > > Hi Harmesh, > > I did something similar to this, and I can offer a few suggestions. I'm not > sure any of these is the *right* answer, but I've found it to be effective > for my purposes. > > Have a field that you boost significantly for the page(s) you'd like higher > in the results. We have something like a keywords field where we put terms > that should bring up a page very high in the results. It's worked quite well. > > Another thing we've done is use the FunctionQuery from solr. (Though I > believe this might be part of Lucene 2.2.0 proper now. I'm not certain.) The > API is here: > http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html > This allows you to assign some numerical value in a field for your site, then > the score is influenced by the value. So in our case, we set high values for > the "import" pieces of the site and smaller values for less "important" > pieces. > > If any of this was unclear, or I didn't actually answer your question, please > let me know. I've done a lot of "hacking" to influence the result rankings. > > Have a great day, > Ann > > ----- Original Message ---- > From: "Harmesh, V2solutions" <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Thursday, June 21, 2007 5:06:00 AM > Subject: How to score a paticular page higher than the other pages > > > Hi , > Can any one help me out that how to score a paticular page higher than the > other, > For eg. > > In my case i crawl through foums, where threads are more important > than the other pages. > I want the links like (http://forum.ottawagolf.com/showthread.php) on the > top of my result instead of > i am getting links like this > (http://forums.roadbikereview.com/forumdisplay.php), which i would perfer at > lower > priority.. > Thanks in advance. > > -- > View this message in context: > http://www.nabble.com/How-to-score-a-paticular-page-higher-than-the-other-pages-tf3957718.html#a11230107 > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > > > > > > ___________________________________________________________________________________ > You snooze, you lose. Get messages ASAP with AutoCheck > in the all-new Yahoo! Mail Beta. > http://advision.webevents.yahoo.com/mailbeta/newmail_html.html > >
We have done custom scoring plugin which modyfies boost of documents and it works as you described and it works great. -- Damian Florczyk aka thunder Gentoo Developer, Gentoo/NetBSD Development Lead ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
