Hi,
> If we set reindex to true in any index definition then Oak would > remove the existing index content before performing the reindex. This > would work fine if the index content are stored within NodeStore > itself. It is important to also specify that this appears as a single commit thanks to the mvcc model: (delete + set reindexed index) so there's no downtime to speak of, the original index is available during the reindex process. > However if the index are stored externally e.g. Solr or Lucene index > with persistence set to filesystem then I think currently we do not > the remove the existing index data which might lead to index > containing stale data. Agreed, this is a problem when storing the index outside the repo. The interesting part here is that only content updates might be affected, deleting a node will not resurface it thanks to the fact that the query engine will reload nodes to see if they are readable to the current session (acl checks) so it skips over the nodes it can't read, if I remember correctly. Focusing on the Lucene index now, I went through the code a bit (no proper tests yet) and it looks like it might not be affected by this that much. A reindex call has the before state empty so Lucene will update all the documents it finds [0], so no stale content on updates here. Just missing deleted node events. So the remaining question is about identifying content that was deleted between the indexed state and the current head state. One simple solution is to run a 'remove all documents query' on the lucene index, but that has the downside of making the index unusable during the time the indexing process runs, so I don't see it as a really good option, only maybe as a fallback of sorts. > Should we provide any sort of callback for indexers when reindex is requested? Thinking about this a bit, there's a simpler way of handling a reindex call. If you really need to know that the current index is actually a reindex call, you can check if the before state is the empty one on the root index editor. best, alex [0] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditor.java#L109 On Mon, Oct 13, 2014 at 7:33 AM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > Hi, > > If we set reindex to true in any index definition then Oak would > remove the existing index content before performing the reindex. This > would work fine if the index content are stored within NodeStore > itself. > > However if the index are stored externally e.g. Solr or Lucene index > with persistence set to filesystem then I think currently we do not > the remove the existing index data which might lead to index > containing stale data. > > Should we provide any sort of callback for indexers when reindex is > requested? > > Chetan Mehrotra >