Hi Paul,

Someone had to point this out to me too: in conf/crawl-urlfilter.txt
there is a line: [EMAIL PROTECTED]
which tells which characters are not allowed in urls.

Try to remove this line or only remove '=' from it

regards,

Jeroen

On 3/19/07, Paul Liddelow <[EMAIL PROTECTED]> wrote:
> Hi
> I have set Nutch up and the crawler (following the intranet tutorial) and
> can fetch results OK for the few URL's I have tested, but for some reason I
> cannot get any results returned when I try to crawl this URL:
> http://www.comlaw.gov.au/ComLaw/legislation/actcompilation1.nsf/sh/browse&VIEW=current&ORDER=bytitle&CATEGORY=actcompilation
>
>
> I think it might have something to do with the file extension ".nsf" which
> is midway in the URL. I think the crawler cannot deal with it. Has anybody
> else had this problem or can help?
>
> Much obliged if anybody knows the answer.
>
> Cheers
> Paul
>


-- 

regards,

Jeroen

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to