Re: [Nutch-general] extracting urls into text files

cha Tue, 20 Mar 2007 07:38:42 -0800

Thanks Enis!!

With your code am able to fulfill my purpose of getting urls into a text
file.


Cheers,
Cha

Enis Soztutar wrote:
> 
> cha wrote:
>> First of all thanks for your reply.
>>   
> you're welcome.
> 
>> Am really got confused !! pardon me..
>> I dont know whether i  need to put the given code by creating new class
>> in
>> nutch directory?
>>  Do i have to import other classes or packages..?? any thing i need to
>> take
>> care of??
>>   
> I can suggest you download eclipse, then using the tutorial on nutch 
> wiki called running nutch on eclipse, set up the project. Then for 
> example in the org.apache.nutch.tools package create a new class and 
> then paste the previously mentioned code.
> 
>     //here fs is an instance of FileSystem object, seqFile is a Path to 
> the crawldb
>     MapFile.Reader reader = new MapFile.Reader (fs, seqFile, conf);
> 
> then in the loop change the below from
> 
> out.println(key);
> 
> to
> 
> out.println("<url><loc>" + key + "</loc></url>");
> 
>> I have tried creating a new separate class in nutch directory..but gives
>> lotsa errors related to packages/class not found.Still try to figuring
>> out
>> whats wrong there.
>>
>> Secondly How should am able to read the urls from crawldb once the class
>> get
>> running..I have know idea how should i figure it out..
>>
>> How can fit the output of my url in some xml format.i.e.
>> <url>
>>     <loc>http://www.example.com/</loc>
>>   </url>
>> <url>
>>     <loc>http://www.example1.com/</loc>
>>   </url>
>> ...........
>> So can you please elaborate me how should i do this..
>>
>> Thanks a lot for your time..
>>   
> Well, there is nothing more i can do except write the code my own : )
> You can first try to be more familiar with Java programming if need be. 
> Good luck
>> Cheers,
>> Cha
>>
>> Enis Soztutar wrote:
>>   
>>> cha wrote:
>>>     
>>>> Thanks enis,
>>>>
>>>> am getting some idea from that..
>>>> Can you tell me in which class i should implement that.
>>>> I havent have hadoop install on my box.
>>>>
>>>>   
>>>>       
>>> Just  make a new class in nutch and put the code there : ) As long as 
>>> you have hadoop jar in your classpath, you do not need to checkout the 
>>> hadoop codebase.
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/extracting-urls-into-text-files-tf3409030.html#a9574929
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] extracting urls into text files

Reply via email to