Re: Reindex and external indexes - Possibility of stale index data

Thomas Mueller Tue, 21 Oct 2014 02:40:25 -0700

Hi,

Yes, that's my point. I wouldn't use MVCC for reindexing the Lucene index.
Reindexing is very costly, and I wouldn't do it in one huge, and possibly
hours long transaction.


* You need to have access to the old and (for readers) the new data (to
re-create the index)
* Eventually, you want to remove the old data (possibly piece by piece)
* You may need to map the structure to a file system, which means separate
directories

Regards,
Thomas



On 21/10/14 11:19, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote:

>Thanks for the details Thomas!
>
>But above model varies from current model which make use of MVCC. The
>reindex operation triggers removal of :data node in branch and
>IndexReader always looks for :data node to open the directory on
>trunk. So while reindex is in progress existing readers make use of
>the node which is not seen as removed in trunk.
>
>What I need is just a way to differentiate index state for a reindex
>call and that can be managed easily via storing the creation time in
>the index definition node which works easily with existing logic
>Chetan Mehrotra
>
>
>On Tue, Oct 21, 2014 at 1:51 PM, Thomas Mueller <muel...@adobe.com> wrote:
>> Hi,
>>
>> The node doesn't need to be moved, even after multiple reindex
>>operations.
>> Please note index creation is no different from reindex. In both cases,
>>a
>> new index data node is created. So, if an index definition is created:
>>
>>     /oak:index/lucene
>>
>> Then the index is being built:
>>
>>     /oak:index/lucene/:data_12345
>>
>> The index is done building (a):
>>
>>     /oak:index/lucene/:data_12345/@ready=true
>>
>> Reindexing is started (b):
>>
>>     /oak:index/lucene/@reindex=true
>>     /oak:index/lucene/:data_12345/@ready=true
>>
>>
>> While reindex is in progress:
>>
>>     /oak:index/lucene/@reindex=true
>>     /oak:index/lucene/:data_12345/@ready=true
>>     /oak:index/lucene/:data_14444
>>
>>
>> When reindex is done (matches a):
>>
>>     /oak:index/lucene/:data_14444/@ready=true
>>
>> Reindex again is just restart from (b).
>>
>> Regards,
>> Thomas
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 21/10/14 10:00, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote:
>>
>>>On Tue, Oct 21, 2014 at 1:18 PM, Thomas Mueller <muel...@adobe.com>
>>>wrote:
>>>> What we need is a distinction between the old and the new index
>>>>*data*.
>>>
>>>Yes and that can be done by storing the index creation time.
>>>
>>>In the approach you suggested where two different nodes are used and
>>>later the nodes are renamed allows the logic to determine that its
>>>reindex. Renaming the node would be fine in this case as actual data
>>>is stored on filesystem but if it contains actual data then such a
>>>move might be costly. For e.g. in copy on read case the index data
>>>would be stored in NodeStore and also on file system. Further this is
>>>something which each such index implementation would need to follow
>>>
>>>Instead if we just record the creation time in the index definition
>>>node and then allow index impls to make use of that info to
>>>distinguish between a reindex and incremental index then that would
>>>serve the same purpose
>>>
>>>
>>>Chetan Mehrotra
>>

Re: Reindex and external indexes - Possibility of stale index data

Reply via email to