Hi, Yes, that's my point. I wouldn't use MVCC for reindexing the Lucene index. Reindexing is very costly, and I wouldn't do it in one huge, and possibly hours long transaction.
* You need to have access to the old and (for readers) the new data (to re-create the index) * Eventually, you want to remove the old data (possibly piece by piece) * You may need to map the structure to a file system, which means separate directories Regards, Thomas On 21/10/14 11:19, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote: >Thanks for the details Thomas! > >But above model varies from current model which make use of MVCC. The >reindex operation triggers removal of :data node in branch and >IndexReader always looks for :data node to open the directory on >trunk. So while reindex is in progress existing readers make use of >the node which is not seen as removed in trunk. > >What I need is just a way to differentiate index state for a reindex >call and that can be managed easily via storing the creation time in >the index definition node which works easily with existing logic >Chetan Mehrotra > > >On Tue, Oct 21, 2014 at 1:51 PM, Thomas Mueller <muel...@adobe.com> wrote: >> Hi, >> >> The node doesn't need to be moved, even after multiple reindex >>operations. >> Please note index creation is no different from reindex. In both cases, >>a >> new index data node is created. So, if an index definition is created: >> >> /oak:index/lucene >> >> Then the index is being built: >> >> /oak:index/lucene/:data_12345 >> >> The index is done building (a): >> >> /oak:index/lucene/:data_12345/@ready=true >> >> Reindexing is started (b): >> >> /oak:index/lucene/@reindex=true >> /oak:index/lucene/:data_12345/@ready=true >> >> >> While reindex is in progress: >> >> /oak:index/lucene/@reindex=true >> /oak:index/lucene/:data_12345/@ready=true >> /oak:index/lucene/:data_14444 >> >> >> When reindex is done (matches a): >> >> /oak:index/lucene/:data_14444/@ready=true >> >> Reindex again is just restart from (b). >> >> Regards, >> Thomas >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On 21/10/14 10:00, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote: >> >>>On Tue, Oct 21, 2014 at 1:18 PM, Thomas Mueller <muel...@adobe.com> >>>wrote: >>>> What we need is a distinction between the old and the new index >>>>*data*. >>> >>>Yes and that can be done by storing the index creation time. >>> >>>In the approach you suggested where two different nodes are used and >>>later the nodes are renamed allows the logic to determine that its >>>reindex. Renaming the node would be fine in this case as actual data >>>is stored on filesystem but if it contains actual data then such a >>>move might be costly. For e.g. in copy on read case the index data >>>would be stored in NodeStore and also on file system. Further this is >>>something which each such index implementation would need to follow >>> >>>Instead if we just record the creation time in the index definition >>>node and then allow index impls to make use of that info to >>>distinguish between a reindex and incremental index then that would >>>serve the same purpose >>> >>> >>>Chetan Mehrotra >>