Hi everyone, I'm using Nutch 1.2 for indexing a intranet-site (with Solr as indexer). I would like to exclude certain parts of the html-pages like the footer for example. I found previous posts about this problem but no one with a clear solution. Can anyone point me to some relevant documentation? From what I understood I should write a plugin for an HtmlParseFilter, is that correct?
Thanks Matthias