[Nutch-general] How to read all the urls crawled

hzhong Fri, 22 Jun 2007 12:07:37 -0700

Hello,

I would like to find out what urls have been crawled by nutch.  But I have a
few questions about it.


1. which db stores the urls that have been crawled? crawldb or linkdb?

2. how do i read all the urls from crawldb or linkdb? 
when I tried bin/nutch readdb /crawl/crawldb -stats, I only get the stats,
not the specific url.
when I tried bin/nutch readdb /crawl/crawldb -url url, I only get the stats
for that specific url.  What I want is to get every url that is in crawldb,
not its stats.

Thank you very much.
-- 
View this message in context: 
http://www.nabble.com/How-to-read-all-the-urls-crawled-tf3966437.html#a11258254
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] How to read all the urls crawled

Reply via email to