Hi Enis, I cant still able to figured it out how it can be done..Can you explain elaborately. please..
Regards, Chandresh Enis Soztutar wrote: > > cha wrote: >> hi sagar, >> >> Thanks for the reply. >> >> Actually am trying to digg out the code in the same class..but not able >> to >> figure it out from where Urls has been read. >> >> When you dump the database, the file contains : >> >> http://blog.cha.com/ Version: 4 >> Status: 2 (DB_fetched) >> Fetch time: Fri Apr 13 15:58:28 IST 2007 >> Modified time: Thu Jan 01 05:30:00 IST 1970 >> Retries since fetch: 0 >> Retry interval: 30.0 days >> Score: 0.062367838 >> Signature: 2b4e94ff83b8a4aa6ed061f607683d2e >> Metadata: null >> >> I figured it out rest of the things but not sure how the Url name has >> been >> read.. >> >> I just want plain urls only in the text file..It is possible that i can >> use >> to write url in some xml formats..If yes then how? >> >> Awaiting, >> >> Chandresh >> >> > Hi, crawldb is a actually a map file, which has urls as keys(Text class) > and CrawlDatum objects as values. You can write a generic map file > reader and then which extracts the keys and dumps to a file. > > > > -- View this message in context: http://www.nabble.com/extracting-urls-into-text-files-tf3409030.html#a9547522 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
