Hi Paul, Someone had to point this out to me too: in conf/crawl-urlfilter.txt there is a line: [EMAIL PROTECTED] which tells which characters are not allowed in urls.
Try to remove this line or only remove '=' from it regards, Jeroen On 3/19/07, Paul Liddelow <[EMAIL PROTECTED]> wrote: > Hi > I have set Nutch up and the crawler (following the intranet tutorial) and > can fetch results OK for the few URL's I have tested, but for some reason I > cannot get any results returned when I try to crawl this URL: > http://www.comlaw.gov.au/ComLaw/legislation/actcompilation1.nsf/sh/browse&VIEW=current&ORDER=bytitle&CATEGORY=actcompilation > > > I think it might have something to do with the file extension ".nsf" which > is midway in the URL. I think the crawler cannot deal with it. Has anybody > else had this problem or can help? > > Much obliged if anybody knows the answer. > > Cheers > Paul > -- regards, Jeroen ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
