Thanks for the responses :)

So the size of the segments then i guess would determine the latency between
crawling and indexing.

I and my colleague will look more into the scripts to see how the diffs get
pushed to Solr.

Thanks again

M


On Tue, Jul 12, 2011 at 6:12 PM, lewis john mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> To add to Julien's comments there was a contribution made by Gabriele a
> while ago which addressed this issue (however I have not used his scripts
> extensively). They might be of interest for a look. Try the link below
>
> http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
>
> On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche <
> lists.digitalpeb...@gmail.com> wrote:
>
>> Hi Matthew,
>>
>> This is usually achieved by writing a script containing the individual
>> Nutch commands (as opposed to calling 'nutch crawl') and index at the end of
>> a generate-fetch-parse-update-linkdb sequence. You don't need any plugins
>> for that
>>
>> HTH
>>
>> Julien
>>
>>
>> On 12 July 2011 13:35, Matthew Painter <matthew.pain...@kusiri.com>wrote:
>>
>>> Hi all,
>>>
>>> I was wondering about the feasibility of creating a plugin for nutch that
>>> create a solr update command, and added it to a queue for indexing after it
>>> first parses the page, rather than when crawling has finished.
>>>
>>> This would allow you to do "real-time" indexing when crawling.
>>>
>>> Drawbacks: Not able to use the graph to give relevancy information.
>>>
>>> Wondering what initial thoughts are about this?
>>>
>>> Thanks :)
>>>
>>>
>>>
>>
>>
>> --
>> *
>> *Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>>
>
>
>
> --
> *Lewis*
>
>

Reply via email to