Re: Print out a list of every URL fetched?

Sebastian Nagel Fri, 07 Aug 2009 00:24:16 -0700

Hi Paul,

you can use


 $NUTCH_HOME/bin/nutch readdb my_crawl/crawldb/ -dump dump_crawldb/ -format csv

then in dump_crawldb you'll find a CSV file with all URLs in your crawlDb.
One column indicates the status. Select only those records with "db_fetched"
and you'll have your list.

Sebastian

Re: Print out a list of every URL fetched?

Reply via email to