RE: More replication questions

2009-03-18 Thread Vauthrin, Laurent
Thanks for the responses.

If we used a poll interval of one second (for 1.4), wouldn't we still have to 
wait for the replication to finish?  In that case, couldn't it take minutes 
(depending on index size) to get that data on the slave?  Or would there be a 
lot less data to pull down because of the high replication frequency (i.e. Will 
it only have small files to replicate)?

-Original Message-
From: solr-user-return-19721-laurent.vauthrin=disney@lucene.apache.org 
[mailto:solr-user-return-19721-laurent.vauthrin=disney@lucene.apache.org] 
On Behalf Of Noble Paul ??? ??
Sent: Tuesday, March 17, 2009 9:04 PM
To: solr-user@lucene.apache.org
Subject: Re: More replication questions

On Wed, Mar 18, 2009 at 12:34 AM, Vauthrin, Laurent
laurent.vauth...@disney.com wrote:
 Hello,



 I have a couple of questions relating to replication in Solr.  As far as
 I understand it, the replication approach for both 1.3 and 1.4 involves
 having the slaves poll the master for updates to the index.  We're
 curious to know if it's possible to have a more dynamic/quicker way to
 propagate updates.



 1.       Is there a built-in mechanism for pushing out
 updates(/inserts/deletes) received by the master to the slaves?
The pull mechanism in 1.4 can be good enough. The 'pollInterval' can
be as small as 1 sec. So you will get the updates within a second
.Isn't it not good enough?

 2.       Is it discouraged to post updates to multiple Solr instances?
 (all instances can receive updates and fulfill query requests)
This is prone to serious errors all the solr instances may not be in sync

 3.       If that sort of capability is not supported, why was it not
 implemented this way?  (So that we don't repeat any mistakes)
A push based replication is in the cards. the implementation is not
trivial. In Solr commits are already expensive s a second's delay may
be alright .

 4.       Has anyone else on the list attempted to do this?  The intent
 here is to achieve optimal performance while have the freshest data
 possible if that's possible.



 Thanks,
 Laurent





-- 
--Noble Paul


Re: More replication questions

2009-03-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
it depends on a few things.
1) no:of docs added
2) is the index optimized
3) autowarming

if the no:of docs added are few and the index is not optimized , the
replication will be will be done in milliseconds (the changed files
will be small). If there is no autoWarming , there should be no delay
in seeing the new data


On Thu, Mar 19, 2009 at 6:23 AM, Vauthrin, Laurent
laurent.vauth...@disney.com wrote:
 Thanks for the responses.

 If we used a poll interval of one second (for 1.4), wouldn't we still have to 
 wait for the replication to finish?  In that case, couldn't it take minutes 
 (depending on index size) to get that data on the slave?  Or would there be a 
 lot less data to pull down because of the high replication frequency (i.e. 
 Will it only have small files to replicate)?

 -Original Message-
 From: solr-user-return-19721-laurent.vauthrin=disney@lucene.apache.org 
 [mailto:solr-user-return-19721-laurent.vauthrin=disney@lucene.apache.org] 
 On Behalf Of Noble Paul ??? ??
 Sent: Tuesday, March 17, 2009 9:04 PM
 To: solr-user@lucene.apache.org
 Subject: Re: More replication questions

 On Wed, Mar 18, 2009 at 12:34 AM, Vauthrin, Laurent
 laurent.vauth...@disney.com wrote:
 Hello,



 I have a couple of questions relating to replication in Solr.  As far as
 I understand it, the replication approach for both 1.3 and 1.4 involves
 having the slaves poll the master for updates to the index.  We're
 curious to know if it's possible to have a more dynamic/quicker way to
 propagate updates.



 1.       Is there a built-in mechanism for pushing out
 updates(/inserts/deletes) received by the master to the slaves?
 The pull mechanism in 1.4 can be good enough. The 'pollInterval' can
 be as small as 1 sec. So you will get the updates within a second
 .Isn't it not good enough?

 2.       Is it discouraged to post updates to multiple Solr instances?
 (all instances can receive updates and fulfill query requests)
 This is prone to serious errors all the solr instances may not be in sync

 3.       If that sort of capability is not supported, why was it not
 implemented this way?  (So that we don't repeat any mistakes)
 A push based replication is in the cards. the implementation is not
 trivial. In Solr commits are already expensive s a second's delay may
 be alright .

 4.       Has anyone else on the list attempted to do this?  The intent
 here is to achieve optimal performance while have the freshest data
 possible if that's possible.



 Thanks,
 Laurent





 --
 --Noble Paul




-- 
--Noble Paul


Re: More replication questions

2009-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Mar 18, 2009 at 12:34 AM, Vauthrin, Laurent
laurent.vauth...@disney.com wrote:
 Hello,



 I have a couple of questions relating to replication in Solr.  As far as
 I understand it, the replication approach for both 1.3 and 1.4 involves
 having the slaves poll the master for updates to the index.  We're
 curious to know if it's possible to have a more dynamic/quicker way to
 propagate updates.



 1.       Is there a built-in mechanism for pushing out
 updates(/inserts/deletes) received by the master to the slaves?
The pull mechanism in 1.4 can be good enough. The 'pollInterval' can
be as small as 1 sec. So you will get the updates within a second
.Isn't it not good enough?

 2.       Is it discouraged to post updates to multiple Solr instances?
 (all instances can receive updates and fulfill query requests)
This is prone to serious errors all the solr instances may not be in sync

 3.       If that sort of capability is not supported, why was it not
 implemented this way?  (So that we don't repeat any mistakes)
A push based replication is in the cards. the implementation is not
trivial. In Solr commits are already expensive s a second's delay may
be alright .

 4.       Has anyone else on the list attempted to do this?  The intent
 here is to achieve optimal performance while have the freshest data
 possible if that's possible.



 Thanks,
 Laurent





-- 
--Noble Paul