Hi: if you just use nutch crawl command, you should put your domain names in crawl-urlfilter.txt
like this: +^http://([a-z0-9]*\.)bbc.co.uk/hindi or +^http://www.bbc.co.uk/hindi good luck 2009/4/6, Ankur Garg <[email protected]>: > Hi All, > > I am trying to crawl BBC Hindi site "http://www.bbc.co.uk/hindi/ " > but after depth 1 it shows, stopping at depth-1, no more urls to fetch. > > Looking at the dump for depth-1, I realised there is no content fetched from > the page, could any one help me to figure out the root cause of the problem, > why it's not fetching any content from the page? > > Had any one tried to crawl the site http://www.bbc.co.uk/hindi/ ?? > > > thanks in advance > > -- > Ankur Garg > अँकुर गर्ग >
