Re: How can we know if 2 lucene indexes are same?

叶双明 Fri, 05 Sep 2008 06:33:42 -0700

Just think about the cost of indexing that many documents on each
slave . It may slow down the responses from live slaves.


I think there must be something like search service at the slaves incude a
IndexSearcher or other equals object, and indexing that many documents by a
IndexWriter , isn't the IndexSearcher affected by the indexing process?
After the indexing, reopen the IndexSearcher to load the new data.




2008/9/5, 叶双明 <[EMAIL PROTECTED]>:
>
> There is more and more complex, actually I hava a small index system can
> config multiple index server for query,
>
> In my opinion,  because  index update operating is synchronized between
> different Thread that update the index, so
>
> for indexing new data : can process data that want to index at the master,
> when get the documents, add the documents to the index at the master and add
> them to every slave,
>
> for deleting data : delete at the master and every slaves at the same time,
>
> I think we can believe  the index is indeed the same at the master and at
> all slaves except other unexpected error in individual node.
>
> And i hear about there is some frame to sync data between computers, but
> just hear about.
>
> Sorry for my englist. :)
>
>
>
>
> 2008/9/5, Michael McCandless <[EMAIL PROTECTED]>:
>>
>>
>> Shalin Shekhar Mangar wrote:
>>
>> Let me try to explain.
>>>
>>> I have a master where indexing is done. I have multiple slaves for
>>> querying.
>>>
>>> If I commit+optimize on the master and then rsync the index, the data
>>> transferred on the network is huge. An alternate way is to commit on
>>> master,
>>> transfer the delta to the slave and issue an optimize on the slave. This
>>> is
>>> very fast because less data is transferred on the network.
>>>
>>
>> Large segment merges will also send huge traffic.  You may just want to
>> send all updates (document adds/deletes) to all slaves directly?  It'd be
>> nice if you could somehow NOT sync the effects of segment merging, but do
>> sync doc add/deletes... not sure how to do that.
>>
>> However, we need to ensure that the index on the slave is actually in sync
>>> with the master. So that on another commit, we can blindly transfer the
>>> delta to the slave.
>>>
>>
>> I assume your app ensures that no deltas arrive to the slave while it's
>> running optimize?  So then the question becomes (I think) "if two indices
>> are identical to begin with, and I separately run optimize on each, will the
>> resulting two optimized indices be identical?".
>>
>> By "in sync" you also require the final segment name (after optimize) is
>> identical right?
>>
>> I think the answer is yes, but I'm not certain unless I think more about
>> it.  Also this behavior is not "promised" in Lucene's API.
>>
>> Merges for optimize are now allowed to run concurrently (by default, with
>> ConcurrentMergeScheduler), except for the final (< mergeFactor segments)
>> merge, which waits until others have finished.  So if there are 7 obvious
>> merges necessary to optimize, 3 will run concurrently, while 4 wait.  Those
>> 4 then run as the merges finish over time, which may happen in different
>> orders for each index and so different merges may run.  Then the final merge
>> will run and I *think* the net number of merges that ran should always be
>> the same and so the final segment name should be the same.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>

Re: How can we know if 2 lucene indexes are same?

Reply via email to