How can I exlude certain mime-types from crawling, for example Word-documents?
If I have parse-tika in plugin.includes it will parse them. Do I have
to change parse-plugins.xml?
I can't exclude them in regex-urlfilter as the .doc extension is not
present in the urls.
Thanks
Matthias
-Original message-
From:Matthias Paul magethle.nu...@gmail.com
Sent: Fri 18-May-2012 14:57
To: user@nutch.apache.org
Subject: Exclude certain mime-types
How can I exlude certain mime-types from crawling, for example Word-documents?
If I have parse-tika in plugin.includes
2 matches
Mail list logo