Hello, I would like to find out what urls have been crawled by nutch. But I have a few questions about it.
1. which db stores the urls that have been crawled? crawldb or linkdb? 2. how do i read all the urls from crawldb or linkdb? when I tried bin/nutch readdb /crawl/crawldb -stats, I only get the stats, not the specific url. when I tried bin/nutch readdb /crawl/crawldb -url url, I only get the stats for that specific url. What I want is to get every url that is in crawldb, not its stats. Thank you very much. -- View this message in context: http://www.nabble.com/How-to-read-all-the-urls-crawled-tf3966437.html#a11258254 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
