I just found an interesting thesis which explains how to turn / modify Nutch into a focused / topical crawler. This thesis helped me a lot. Maybe useful to others...
http://wing.comp.nus.edu.sg/publications/theses/2009/markusHaenseThesis.pdf MyD wrote: > > Hi @ all, > > I'd like to turn Nutch into an focused / topical crawler. I started to > analyze the code and think that I found the right peace of code. I just > wanted to know if I am on the right track. I think the right peace of code > to implement a decision to fetch further is in the method output of the > Fetcher class every time we call the collect method of the OutputCollector > object. > > private ParseStatus output(Text key, CrawlDatum datum, Content content, > ProtocolStatus pstatus, int status) { > ... > output.collect(...); > ... > } > > Would you mind to let me know the the best way to turn this decision into > an plugin? I was thinking to go a similar way like the scoring filters. > Thanks in advance. > > Cheers, > MyD > -- View this message in context: http://www.nabble.com/Nutch-Topical---Focused-Crawl-tp22765848p25764131.html Sent from the Nutch - Dev mailing list archive at Nabble.com.