frgrfg gfsdgffsd wrote: > Hi all, > > I have a problem with the crawl/fetch of 1 website (www.lequipe.fr), > although it works for fine another (www.lemonde.fr). > > Here are the errors: > ERROR [MAT] 2006-11-22 00:36:20,860 - Http.invoke0(?) | > java.lang.IllegalArgumentException: null metadata > ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at > org.apache.nutch.protocol.Content.<init>(Content.java:60) > ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:196) > ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:162) > > Don't understand why metadata is null when there are some metadata on the > pages... >
what version of nutch are you running? > I also have this messsage just before: > INFO [MAT] 2006-11-22 00:36:32,477 - HttpBase.getProtocolOutput(194) | > Skipping: http://www.lequipe.fr/ exceeds fetcher.max.crawl.delay, max=30, > Crawl-Delay=120 > > and i can't find this property in nutch-site.xml You need to add it there. <property> <name>fetcher.max.crawl.delay</name> <value> your value here </value> </property> -- Sami Siren ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
