Re: Weighting different html text nodes - h1,h2 etc..

Ken Krugler Thu, 09 Jul 2009 06:41:10 -0700

Hi, Would I be correct in thinking that Nutch, when indexing an html
document, does not weight the different text nodes (h1, h2, anchor etc)
differently - instead it just lumps together all text as one? (this is
the impression I get from looking at
org.apache.nutch.parse.html.HtmlParser)

Yes, AFAIK there's no special weighting given to text pulled from thebody of the HTML.

I believe Nutch does give higher weight to the anchor text found forlinks that point to the page, which is a key factor in generatingbetter search results.


-- Ken
--
Ken Krugler
+1 530-210-6378

Re: Weighting different html text nodes - h1,h2 etc..

Reply via email to