Hi everyone,

I'm using Nutch 1.2 for indexing a intranet-site (with Solr as indexer). I
would like to exclude certain parts of the html-pages like the footer for
example. I found previous posts about this problem but no one with a clear
solution.
Can anyone point me to some relevant documentation? From what I understood I
should write a plugin for an HtmlParseFilter, is that correct?

Thanks
Matthias

Reply via email to