Exclude html-content from index

2010-10-07 Thread Matthias Paul
Hi everyone, I'm using Nutch 1.2 for indexing a intranet-site (with Solr as indexer). I would like to exclude certain parts of the html-pages like the footer for example. I found previous posts about this problem but no one with a clear solution. Can anyone point me to some relevant

Re: Exclude html-content from index

2010-10-07 Thread Israel
Hi Matthias, I donĀ“t have the answer to your question, but wanted to ask how to integrate SOLR to nutch 1.2 and what brings benefits.

Re: Exclude html-content from index

2010-10-07 Thread Israel
Thanks Matthias, I regret not being able to help you with your problem . Regards