Thanks Enis!!
With your code am able to fulfill my purpose of getting urls into a text
file.
Cheers,
Cha
Enis Soztutar wrote:
>
> cha wrote:
>> First of all thanks for your reply.
>>
> you're welcome.
>
>> Am really got confused !! pardon me..
>> I dont know whether i need to put the given code by creating new class
>> in
>> nutch directory?
>> Do i have to import other classes or packages..?? any thing i need to
>> take
>> care of??
>>
> I can suggest you download eclipse, then using the tutorial on nutch
> wiki called running nutch on eclipse, set up the project. Then for
> example in the org.apache.nutch.tools package create a new class and
> then paste the previously mentioned code.
>
> //here fs is an instance of FileSystem object, seqFile is a Path to
> the crawldb
> MapFile.Reader reader = new MapFile.Reader (fs, seqFile, conf);
>
> then in the loop change the below from
>
> out.println(key);
>
> to
>
> out.println("<url><loc>" + key + "</loc></url>");
>
>> I have tried creating a new separate class in nutch directory..but gives
>> lotsa errors related to packages/class not found.Still try to figuring
>> out
>> whats wrong there.
>>
>> Secondly How should am able to read the urls from crawldb once the class
>> get
>> running..I have know idea how should i figure it out..
>>
>> How can fit the output of my url in some xml format.i.e.
>> <url>
>> <loc>http://www.example.com/</loc>
>> </url>
>> <url>
>> <loc>http://www.example1.com/</loc>
>> </url>
>> ...........
>> So can you please elaborate me how should i do this..
>>
>> Thanks a lot for your time..
>>
> Well, there is nothing more i can do except write the code my own : )
> You can first try to be more familiar with Java programming if need be.
> Good luck
>> Cheers,
>> Cha
>>
>> Enis Soztutar wrote:
>>
>>> cha wrote:
>>>
>>>> Thanks enis,
>>>>
>>>> am getting some idea from that..
>>>> Can you tell me in which class i should implement that.
>>>> I havent have hadoop install on my box.
>>>>
>>>>
>>>>
>>> Just make a new class in nutch and put the code there : ) As long as
>>> you have hadoop jar in your classpath, you do not need to checkout the
>>> hadoop codebase.
>>>
>>>
>>>
>>>
>>
>>
>
>
>
--
View this message in context:
http://www.nabble.com/extracting-urls-into-text-files-tf3409030.html#a9574929
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general