Re: not crawling relative URLs

Kai_testing Middleton Thu, 28 Jun 2007 11:30:32 -0700

Ok, I guess I lied.

Nutch IS capable of crawling relative URLs.


Essentially what happened is that the page I was attempting to crawl, 
http://www.sf911truth.org, had more than 100 outlinks on it and the relative 
URL for about.html that I was expecting to see in my crawl.log was outlink 
#105.  This was fixed by changing db.max.outlinks.per.page  to -1 (unlimited # 
of outlinks) in nutch-site.xml.

For a detailed discussion see "Re: [Nutch-dev] NUTCH-119 :: how hard to fix":
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12592.html

Now it works.

--Kai Middleton




      
____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7

Re: not crawling relative URLs

Reply via email to