There's 'nutch readdb' command ->
[EMAIL PROTECTED]:~> nutch readdb
Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN
<nnnn> <out_dir> [<min>] | -url <url>)
<crawldb> directory name where crawldb is located
-stats print overall statistics to System.out
-dump <out_dir> dump the whole db to a text file in <out_dir>
-url <url> print information on <url> to System.out
-topN <nnnn> <out_dir> [<min>] dump top <nnnn> urls sorted by
score to <out_dir>
[<min>] skip records with scores below this value.
This can significantly improve performance.
Is this what you're looking for?
Rgrds, Thomas
On 7/25/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Is there any way to find out what web pages on a specific domain have
> been crawled by Nutch ?
> In other words is there any way to get the list of urls that were
> downloaded and processed by Nutch ?
>
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general