I just started using Nutch to crawl an intranet site. In my urls file, I have
a single link that refers to a jhtml page, which contains roughly 2000 links
in it. The links contain characters like '?' and '=', so I removed the
following from the crawl-urlfilter.txt file:

# skip URLs containing certain characters as probable queries, etc.
-[...@=]

I finally got the crawl to work, but I only see 111 results under "TOTAL
urls:" when I run the following command:

bin/nutch readdb crawlTest/crawldb -stats 

I'm not sure where to look at this point. Any ideas?

BTW what's the command that dumps all the links? Every one that I found
online doesn't work...

Cheers
-- 
View this message in context: 
http://www.nabble.com/Crawler-not-fetching-all-the-links-tp21418679p21418679.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to