Hello, During a fetch, the fetcher failed to retrieve a certain page with the following exception:
// url is masked **** Error parsing: http://*********/validCode.asp: org.apache.nutch.parse.ParseException: parser not found for contentType=image/bmp url=http://0086jia.com/include/validCode.asp at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:81) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output( Fetcher.java:349) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java :194) i've configed both regex-urlfilter.txt; # skip image and other suffixes we can't yet parse -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|wmv|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|jpeg|JPEG| bmp|BMP|swf)$ and suffix-urlfilter.txt: ### prohibit these # pictures .gif .jpg .jpeg .bmp .png .tif .tiff both plugins are in the nutch-site "plugin-include" property: <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|urlfilter-suffix| parse-(text|html|js|zip)|query-(basic|site|url)|index-basic|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> and my crawling is done by running: nutch inject/generate/fetch loops. Am i missing some property i should config in order to avoid fetching/crawling contentTypes i don't to? (same goes for xml/jpeg... and other filetypes). Thanks! Eyal.