Weighting different html text nodes - h1,h2 etc..

Joel Halbert Thu, 09 Jul 2009 06:31:26 -0700

Hi, Would I be correct in thinking that Nutch, when indexing an html
document, does not weight the different text nodes (h1, h2, anchor etc)
differently - instead it just lumps together all text as one? (this is
the impression I get from looking at
org.apache.nutch.parse.html.HtmlParser)


Rgs, 
Joel

Weighting different html text nodes - h1,h2 etc..

Reply via email to