frgrfg gfsdgffsd wrote:
> Hi all,
> 
> I have  a problem with the crawl/fetch of 1 website (www.lequipe.fr), 
> although it works for fine another (www.lemonde.fr).
> 
> Here are the errors:
> ERROR [MAT] 2006-11-22 00:36:20,860 - Http.invoke0(?) | 
> java.lang.IllegalArgumentException: null metadata
> ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
> org.apache.nutch.protocol.Content.<init>(Content.java:60)
> ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:196)
> ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:162)
> 
> Don't understand why metadata is null when there are some metadata on the 
> pages... 
> 

what version of nutch are you running?


> I also have this messsage just before:
> INFO [MAT] 2006-11-22 00:36:32,477 - HttpBase.getProtocolOutput(194) | 
> Skipping: http://www.lequipe.fr/ exceeds fetcher.max.crawl.delay, max=30, 
> Crawl-Delay=120
> 
> and i can't find this property in nutch-site.xml

You need to add it there.

<property>
  <name>fetcher.max.crawl.delay</name>
  <value>  your value here  </value>
</property>

--
  Sami Siren

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to