Re: [Nutch-general] extracting urls into text files

cha Sun, 18 Mar 2007 23:31:25 -0800

Hi Enis,

I cant still able to figured it out how it can be done..Can you explain
elaborately.
please..


Regards,
Chandresh

Enis Soztutar wrote:
> 
> cha wrote:
>> hi sagar,
>>
>> Thanks for the reply.
>>
>> Actually am trying to digg out the code in the same class..but not able
>> to
>> figure it out from where Urls has been read.
>>
>> When you dump the database, the file contains :
>>
>> http://blog.cha.com/ Version: 4
>> Status: 2 (DB_fetched)
>> Fetch time: Fri Apr 13 15:58:28 IST 2007
>> Modified time: Thu Jan 01 05:30:00 IST 1970
>> Retries since fetch: 0
>> Retry interval: 30.0 days
>> Score: 0.062367838
>> Signature: 2b4e94ff83b8a4aa6ed061f607683d2e
>> Metadata: null
>>
>> I figured it out rest of the things but not sure how the Url name has
>> been
>> read..
>>
>> I just want plain urls only  in the text file..It is possible that i can
>> use
>> to write url in some xml formats..If yes then how?
>>
>> Awaiting,
>>
>> Chandresh
>>
>>   
> Hi, crawldb is a actually a map file, which has urls as keys(Text class) 
> and CrawlDatum objects as values. You can write a generic map file 
> reader and then which extracts the keys and dumps to a file.
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/extracting-urls-into-text-files-tf3409030.html#a9547522
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] extracting urls into text files

Reply via email to