Ok, I guess I lied.
Nutch IS capable of crawling relative URLs.
Essentially what happened is that the page I was attempting to crawl,
http://www.sf911truth.org, had more than 100 outlinks on it and the relative
URL for about.html that I was expecting to see in my crawl.log was outlink
#105. This was fixed by changing db.max.outlinks.per.page to -1 (unlimited #
of outlinks) in nutch-site.xml.
For a detailed discussion see "Re: [Nutch-dev] NUTCH-119 :: how hard to fix":
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12592.html
Now it works.
--Kai Middleton
____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general