Thanks for the responses :) So the size of the segments then i guess would determine the latency between crawling and indexing.
I and my colleague will look more into the scripts to see how the diffs get pushed to Solr. Thanks again M On Tue, Jul 12, 2011 at 6:12 PM, lewis john mcgibbney < lewis.mcgibb...@gmail.com> wrote: > To add to Julien's comments there was a contribution made by Gabriele a > while ago which addressed this issue (however I have not used his scripts > extensively). They might be of interest for a look. Try the link below > > http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script > > On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > >> Hi Matthew, >> >> This is usually achieved by writing a script containing the individual >> Nutch commands (as opposed to calling 'nutch crawl') and index at the end of >> a generate-fetch-parse-update-linkdb sequence. You don't need any plugins >> for that >> >> HTH >> >> Julien >> >> >> On 12 July 2011 13:35, Matthew Painter <matthew.pain...@kusiri.com>wrote: >> >>> Hi all, >>> >>> I was wondering about the feasibility of creating a plugin for nutch that >>> create a solr update command, and added it to a queue for indexing after it >>> first parses the page, rather than when crawling has finished. >>> >>> This would allow you to do "real-time" indexing when crawling. >>> >>> Drawbacks: Not able to use the graph to give relevancy information. >>> >>> Wondering what initial thoughts are about this? >>> >>> Thanks :) >>> >>> >>> >> >> >> -- >> * >> *Open Source Solutions for Text Engineering >> >> http://digitalpebble.blogspot.com/ >> http://www.digitalpebble.com >> > > > > -- > *Lewis* > >