On Sat, 2005-04-16 at 13:21 +0200, Boris Kroeger wrote:
> One file of mine contained an exclamation mark (!) and was not processed 
> by the nutch crawl.
> After I removed it nutch was able to process it.
> May be there are further characters?
> 
> Is this worth an issue in JIRA?

See crawl-urlfilter.txt and regex-urlfilter.txt:
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]

(another minor but important difference between filesystem indexing and
web indexing)


-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to