Re: Problem crawling BBC Hindi Site

yanky young Mon, 06 Apr 2009 07:58:56 -0700

Hi:

if you just use nutch crawl command, you should put your domain names
in crawl-urlfilter.txt


like this:

+^http://([a-z0-9]*\.)bbc.co.uk/hindi

or

+^http://www.bbc.co.uk/hindi

good luck



2009/4/6, Ankur Garg <[email protected]>:
> Hi All,
>
> I am trying to crawl BBC Hindi site "http://www.bbc.co.uk/hindi/ "
> but after depth 1 it shows, stopping at depth-1, no more urls to fetch.
>
> Looking at the dump for depth-1, I realised there is no content fetched from
> the page, could any one help me to figure out the root cause of the problem,
> why it's not fetching any content from the page?
>
> Had any one tried to crawl the site http://www.bbc.co.uk/hindi/   ??
>
>
> thanks in advance
>
> --
> Ankur Garg
> अँकुर गर्ग
>

Re: Problem crawling BBC Hindi Site

Reply via email to