Re: Replicating Lucene Index with out SOLR

Shalin Shekhar Mangar Thu, 28 Aug 2008 07:17:48 -0700

Slightly off-topic. Robert -- you may want to look at SOLR-561 -- Solr
replication by Solr (for windows also) which is under development.


https://issues.apache.org/jira/browse/SOLR-561

On Thu, Aug 28, 2008 at 7:39 PM, Robert Stewart <[EMAIL PROTECTED]
> wrote:

> We don't use Solr, since we run on Windows <sigh>;(</sigh>, but we did
> implement very similar snapshot replication.  We have 2 master index servers
> building indexes, partitioned by document.  Every 1 minute, we stop index
> writer, create a local snapshot (on the master server), in directory named
> YYYYMMDDHHMMSS for current timestamp.  Then each query server has a
> background thread which periodically looks in remote directories on master
> server for new snapshot directory.  If it finds one, it copies the new
> snapshot locally to the query server, using the following algorithm:
>
> 1. Make a local copy of existing local snapshot:
>        a. Copy all "changeable" files (segments file, etc.)
>        b. Create NTFS "hard-links" for all other files (index files)
> 2. Copy any new files in new remote index which do not already exist in
> local snapshot (since Lucene does not every modify existing index files,
> only new files we need to copy (and new segments file).
> 3. Delete any files which no longer exist (only deletes local hard-link,
> not actual file in current snapshot).
> 4. Open index reader on new local snapshot, and run some "warming" queries.
> 5. Switch current index reader object to new index reader object so
> searches go against new local snapshot.
>
> Step 1 above is also used on master index server when making new local
> snapshots.
>
> Also, note that we don't use rsync.  You do not need it.  You only need to
> make hard-links, and always copy any "changeable" files, such as "segments"
> file.  Lucene does not modify index files, only creates new ones (and
> deletes old ones after a merge/optimization).
>
> We use following settings for index writer:
>
> This gives many segments but search is still very fast, and total MB of new
> files copied for each snapshot is relatively small.
>
> MergeFactor = 2
> MaxBufferedDocs = 10
> MaxMergeDocs = 1,000,000
>
> Currently we have about 25 million documents in the master index.
>
> -----Original Message-----
> From: Bill Au [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 28, 2008 8:22 AM
> To: java-user@lucene.apache.org
> Subject: Re: Replicating Lucene Index with out SOLR
>
> The snapinstaller script invokes the commit command to trigger Solr to do a
> commit, which open a new index reader and then auto-warm the caches.  You
> will need to replace that with your own code to do the same for your Lucene
> index.
>
> On Thu, Aug 28, 2008 at 1:47 AM, rahul_k123 <[EMAIL PROTECTED]>
> wrote:
>
> >
> > Can i make use of solr scripts for this purpose.
> >
> >
> > The snapinstaller runs on the slave after a snapshot has been pulled from
> > the master. This signals the local Solr server to open a new index
> reader,
> > then auto-warming of the cache(s) begins (in the new reader), while other
> > requests continue to be served by the original index reader.
> >
> > How can i achieve the above in my case??
> >
> >
> > Otis Gospodnetic wrote:
> > >
> > > You don't need to copy the whole index every time if you do incremental
> > > indexing/updates and don't optimize the index before copying.  If you
> use
> > > rsync for copying the index, only the new/modified files be copied.
>  This
> > > is what Solr replication scripts do, too.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > >
> > > ----- Original Message ----
> > >> From: rahul_k123 <[EMAIL PROTECTED]>
> > >> To: [EMAIL PROTECTED]
> > >> Sent: Wednesday, August 27, 2008 11:36:07 PM
> > >> Subject: Re: Replicating Lucene Index with out SOLR
> > >>
> > >>
> > >> Currently we index every certain amount of time on A.
> > >>
> > >> -copy the index
> > >>      Copying the whole index everytime ?
> > >>
> > >> Currently i am investigating how i can make use of SOLR replication
> > >> scripts
> > >> to achive this.
> > >>
> > >>
> > >> Is there anyone who did this with out SOLR before?
> > >>
> > >>
> > >> Thanks
> > >>
> > >>
> > >>
> > >> Otis Gospodnetic wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > You may want to ask on the java-user list (more subscribers), which
> > I'm
> > >> > CC-ing, so we can continue discussion there.
> > >> > I think you will have to implement your own logic that runs on A and
> > >> does
> > >> > something like this:
> > >> >
> > >> > - stop adding new docs
> > >> > - call commit on the IndexWriter
> > >> >
> > >> > - copy the index
> > >> > - resume indexing
> > >> >
> > >> > Otis
> > >> > --
> > >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >> >
> > >> >
> > >> >
> > >> > ----- Original Message ----
> > >> >> From: rahul_k123
> > >> >> To: [EMAIL PROTECTED]
> > >> >> Sent: Thursday, August 28, 2008 1:34:41 AM
> > >> >> Subject: Replicating Lucene Index with out SOLR
> > >> >>
> > >> >>
> > >> >> I have the following requirement
> > >> >>
> > >> >> Right now we have multiple indexes  serving our web application.
> Our
> > >> >> indexes
> > >> >> are around 30 GB size.
> > >> >>
> > >> >> We want to replicate the index data so that we can use them to
> > >> distribute
> > >> >> the search load.
> > >> >>
> > >> >> This is what we need ideally.
> > >> >>
> > >> >> A - (supports writes and reads)
> > >> >>
> > >> >> A1 -Replicated Index (Supports reads)  . We want to synchronize
> this
> > >> >> every 5
> > >> >> mins.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Any help is appreciated.   We are not using SOLR
> > >> >>
> > >> >> I also interested in knowing what will be the best way so that I
> can
> > >> >> scale
> > >> >> my application adding more boxes for search if our load increases.
> > >> >>
> > >> >> Thanks.
> > >> >>
> > >> >> --
> > >> >> View this message in context:
> > >> >>
> > >>
> >
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19191752.html
> > >> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> > >> >
> > >> >
> > >> >
> > >>
> > >> --
> > >> View this message in context:
> > >>
> >
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19191752p19193670.html
> > >> Sent from the Lucene - General mailing list archive at Nabble.com.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Replicating-Lucene-Index-with-out-SOLR-tp19193696p19194576.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Replicating Lucene Index with out SOLR

Reply via email to