I just found an interesting thesis which explains how to turn / modify Nutch
into a focused / topical crawler. This thesis helped me a lot. Maybe useful
to others...

http://wing.comp.nus.edu.sg/publications/theses/2009/markusHaenseThesis.pdf



MyD wrote:
> 
> Hi @ all,
> 
> I'd like to turn Nutch into an focused / topical crawler. I started to
> analyze the code and think that I found the right peace of code. I just
> wanted to know if I am on the right track. I think the right peace of code
> to implement a decision to fetch further is in the method output of the
> Fetcher class every time we call the collect method of the OutputCollector
> object.
> 
> private ParseStatus output(Text key, CrawlDatum datum, Content content,
> ProtocolStatus pstatus, int status) {
> ...
> output.collect(...);
> ...
> }
> 
> Would you mind to let me know the the best way to turn this decision into
> an plugin? I was thinking to go a similar way like the scoring filters.
> Thanks in advance.
> 
> Cheers,
> MyD
> 

-- 
View this message in context: 
http://www.nabble.com/Nutch-Topical---Focused-Crawl-tp22765848p25764131.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to