Most webpages have sections like navigation, header, left column for related links, footer, etc. How can I prevent Nutch from returning search results that contain keywords only in the non-main body of the page? e.g. keywords can appear in navigation bar or footer, but they may not appear in the main body of the webpage, so this webpage may not be relevant.
Maybe I can: a) specify sections to index? b) specify sections to not index? c) build a parse filter that strips out the content? Thanks. -- View this message in context: http://www.nabble.com/How-to-ignore-search-results-that-don%27t-have-related-keywords-in-main-body--tp22654668p22654668.html Sent from the Nutch - User mailing list archive at Nabble.com.