Nutch intranet crawling

dasari pavan kumar Fri, 15 Feb 2008 05:21:58 -0800

Hi,
    I am facing a strange problem in intranet crawling. In the current case,
I have web pages of various users like


http://192.168.36.200/~user1
http://192.168.36.200/~user2 ... etc

i am unable to fetch the pages ( they r proper web pages with access
permissions ) .. i am getting the following output when i give the following
command:

bin/nutch crawl urls -dir test_crawl -depth 10 -topN 1000

Generator: 0 records selected for fetching, exiting ...
stopping at depth=1 - no more urls to fetch ...

but i have many links in those seed urls, for e.g. 200/~user1/page1.html ,
200/~user2/test1.html, etc


can someone please help me in this regard. and one more thing.. i am working
in windows..

thanx a lot in advance,
pavan

Nutch intranet crawling

Reply via email to