In Ocean I had to use a transaction log and execute everything that way like SQL database replication. Then let each node handle it's own merging process. Syncing the indexes is used to get a new node up to speed, otherwise it's avoided for the reasons mentioned in the previous email.
On Fri, Sep 5, 2008 at 8:33 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Shalin Shekhar Mangar wrote: > >> Let me try to explain. >> >> I have a master where indexing is done. I have multiple slaves for >> querying. >> >> If I commit+optimize on the master and then rsync the index, the data >> transferred on the network is huge. An alternate way is to commit on >> master, >> transfer the delta to the slave and issue an optimize on the slave. This >> is >> very fast because less data is transferred on the network. > > Large segment merges will also send huge traffic. You may just want to send > all updates (document adds/deletes) to all slaves directly? It'd be nice if > you could somehow NOT sync the effects of segment merging, but do sync doc > add/deletes... not sure how to do that. > >> However, we need to ensure that the index on the slave is actually in sync >> with the master. So that on another commit, we can blindly transfer the >> delta to the slave. > > I assume your app ensures that no deltas arrive to the slave while it's > running optimize? So then the question becomes (I think) "if two indices > are identical to begin with, and I separately run optimize on each, will the > resulting two optimized indices be identical?". > > By "in sync" you also require the final segment name (after optimize) is > identical right? > > I think the answer is yes, but I'm not certain unless I think more about it. > Also this behavior is not "promised" in Lucene's API. > > Merges for optimize are now allowed to run concurrently (by default, with > ConcurrentMergeScheduler), except for the final (< mergeFactor segments) > merge, which waits until others have finished. So if there are 7 obvious > merges necessary to optimize, 3 will run concurrently, while 4 wait. Those > 4 then run as the merges finish over time, which may happen in different > orders for each index and so different merges may run. Then the final merge > will run and I *think* the net number of merges that ran should always be > the same and so the final segment name should be the same. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]