yes, there are tools which you can use to dump the content of crawl db, link db and segments.
dump=./crawl/dump bin/nutch readdb $crawl/crawldb -dump $dump/crawldb bin/nutch readlinkdb $crawl/linkdb -dump $dump/linkdb bin/nutch readseg -dump $1 $dump/segments/$1 you will get more info if you call bin/nutch readdb bin/nutch readlinkdb bin/nutch readseg Paul Tomblin schrieb: > The nutch data files are pretty opaque, and even "strings" can't extract > anything except the occasional URL. Is there any code to dump the contents > of the various files in a human readable form? > >
