cha wrote: > hi sagar, > > Thanks for the reply. > > Actually am trying to digg out the code in the same class..but not able to > figure it out from where Urls has been read. > > When you dump the database, the file contains : > > http://blog.cha.com/ Version: 4 > Status: 2 (DB_fetched) > Fetch time: Fri Apr 13 15:58:28 IST 2007 > Modified time: Thu Jan 01 05:30:00 IST 1970 > Retries since fetch: 0 > Retry interval: 30.0 days > Score: 0.062367838 > Signature: 2b4e94ff83b8a4aa6ed061f607683d2e > Metadata: null > > I figured it out rest of the things but not sure how the Url name has been > read.. > > I just want plain urls only in the text file..It is possible that i can use > to write url in some xml formats..If yes then how? > > Awaiting, > > Chandresh > > Hi, crawldb is a actually a map file, which has urls as keys(Text class) and CrawlDatum objects as values. You can write a generic map file reader and then which extracts the keys and dumps to a file.
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
