yes, there are tools which you can use to dump the content of crawl db,
link db and segments.

dump=./crawl/dump
bin/nutch readdb $crawl/crawldb -dump $dump/crawldb
bin/nutch readlinkdb $crawl/linkdb -dump $dump/linkdb
bin/nutch readseg -dump $1 $dump/segments/$1

you will get more info if you call

bin/nutch readdb
bin/nutch readlinkdb
bin/nutch readseg

Paul Tomblin schrieb:
> The nutch data files are pretty opaque, and even "strings" can't extract
> anything except the occasional URL.  Is there any code to dump the contents
> of the various files in a human readable form?
>
>   

Reply via email to