> Thanks for the responses :)
> 
> So the size of the segments then i guess would determine the latency
> between crawling and indexing.

The size of your crawldb may matter even more in some cases. If you segment 
has just on file and your crawldb many millions, the indexing takes forever.

> 
> I and my colleague will look more into the scripts to see how the diffs get
> pushed to Solr.
> 
> Thanks again
> 
> M
> 
> 
> On Tue, Jul 12, 2011 at 6:12 PM, lewis john mcgibbney <
> 
> lewis.mcgibb...@gmail.com> wrote:
> > To add to Julien's comments there was a contribution made by Gabriele a
> > while ago which addressed this issue (however I have not used his scripts
> > extensively). They might be of interest for a look. Try the link below
> > 
> > http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
> > 
> > On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche <
> > 
> > lists.digitalpeb...@gmail.com> wrote:
> >> Hi Matthew,
> >> 
> >> This is usually achieved by writing a script containing the individual
> >> Nutch commands (as opposed to calling 'nutch crawl') and index at the
> >> end of a generate-fetch-parse-update-linkdb sequence. You don't need
> >> any plugins for that
> >> 
> >> HTH
> >> 
> >> Julien
> >> 
> >> On 12 July 2011 13:35, Matthew Painter <matthew.pain...@kusiri.com>wrote:
> >>> Hi all,
> >>> 
> >>> I was wondering about the feasibility of creating a plugin for nutch
> >>> that create a solr update command, and added it to a queue for
> >>> indexing after it first parses the page, rather than when crawling has
> >>> finished.
> >>> 
> >>> This would allow you to do "real-time" indexing when crawling.
> >>> 
> >>> Drawbacks: Not able to use the graph to give relevancy information.
> >>> 
> >>> Wondering what initial thoughts are about this?
> >>> 
> >>> Thanks :)
> >> 
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >> 
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> > 
> > --
> > *Lewis*

Reply via email to