On Sat, 2005-04-16 at 13:21 +0200, Boris Kroeger wrote: > One file of mine contained an exclamation mark (!) and was not processed > by the nutch crawl. > After I removed it nutch was able to process it. > May be there are further characters? > > Is this worth an issue in JIRA?
See crawl-urlfilter.txt and regex-urlfilter.txt: # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] (another minor but important difference between filesystem indexing and web indexing) ------------------------------------------------------- This SF.Net email is sponsored by: New Crystal Reports XI. Version 11 adds new functionality designed to reduce time involved in creating, integrating, and deploying reporting solutions. Free runtime info, new features, or free trial, at: http://www.businessobjects.com/devxi/728 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
