Hello, Thanks.

I am running nutch 0.8.1.
What is this property for? Should I set it at 120 as requested by the error 
message?
Another prolem that I have is that on some website, all pages are not fetched, 
and even more weird, some which are doesn't actually exist...

Thanking you in advance,

Mat


----- Message d'origine ----
De : Sami Siren <[EMAIL PROTECTED]>
À : [email protected]
Envoyé le : Mercredi, 22 Novembre 2006, 22h40mn 10s
Objet : Re: Fetch fails

frgrfg gfsdgffsd wrote:
> Hi all,
> 
> I have  a problem with the crawl/fetch of 1 website (www.lequipe.fr), 
> although it works for fine another (www.lemonde.fr).
> 
> Here are the errors:
> ERROR [MAT] 2006-11-22 00:36:20,860 - Http.invoke0(?) | 
> java.lang.IllegalArgumentException: null metadata
> ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
> org.apache.nutch.protocol.Content.<init>(Content.java:60)
> ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:196)
> ERROR [MAT] 2006-11-22 00:36:20,870 - Http.invoke0(?) | at 
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:162)
> 
> Don't understand why metadata is null when there are some metadata on the 
> pages... 
> 

what version of nutch are you running?


> I also have this messsage just before:
> INFO [MAT] 2006-11-22 00:36:32,477 - HttpBase.getProtocolOutput(194) | 
> Skipping: http://www.lequipe.fr/ exceeds fetcher.max.crawl.delay, max=30, 
> Crawl-Delay=120
> 
> and i can't find this property in nutch-site.xml

You need to add it there.

<property>
  <name>fetcher.max.crawl.delay</name>
  <value>  your value here  </value>
</property>

--
  Sami Siren







        

        
                
___________________________________________________________________________ 
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son 
interface révolutionnaire.
http://fr.mail.yahoo.com
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to