Most webpages have sections like navigation, header, left column for related
links, footer, etc.  How can I prevent Nutch from returning search results
that contain keywords only in the non-main body of the page?  e.g. keywords
can appear in navigation bar or footer, but they may not appear in the main
body of the webpage, so this webpage may not be relevant.

Maybe I can:

a) specify sections to index?
b) specify sections to not index?
c) build a parse filter that strips out the content?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/How-to-ignore-search-results-that-don%27t-have-related-keywords-in-main-body--tp22654668p22654668.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to