Hi,

> If we set reindex to true in any index definition then Oak would
> remove the existing index content before performing the reindex. This
> would work fine if the index content are stored within NodeStore
> itself.

It is important to also specify that this appears as a single commit thanks
to the mvcc model: (delete + set reindexed index) so there's no downtime to
speak of, the original index is available during the reindex process.


> However if the index are stored externally e.g. Solr or Lucene index
> with persistence set to filesystem then I think currently we do not
> the remove the existing index data which might lead to index
> containing stale data.

Agreed, this is a problem when storing the index outside the repo. The
interesting part here is that only content updates might be affected,
deleting a node will not resurface it thanks to the fact that the query
engine will reload nodes to see if they are readable to the current session
(acl checks) so it skips over the nodes it can't read, if I remember
correctly.

Focusing on the Lucene index now, I went through the code a bit (no proper
tests yet) and it looks like it might not be affected by this that much. A
reindex call has the before state empty so Lucene will update all the
documents it finds [0], so no stale content on updates here. Just missing
deleted node events.
So the remaining question is about identifying content that was deleted
between the indexed state and the current head state. One simple solution
is to run a 'remove all documents query' on the lucene index, but that has
the downside of making the index unusable during the time the indexing
process runs, so I don't see it as a really good option, only maybe as a
fallback of sorts.


> Should we provide any sort of callback for indexers when reindex is
requested?
Thinking about this a bit, there's a simpler way of handling a reindex
call. If you really need to know that the current index is actually a
reindex call, you can check if the before state is the empty one on the
root index editor.

best,
alex

[0]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditor.java#L109



On Mon, Oct 13, 2014 at 7:33 AM, Chetan Mehrotra <chetan.mehro...@gmail.com>
wrote:

> Hi,
>
> If we set reindex to true in any index definition then Oak would
> remove the existing index content before performing the reindex. This
> would work fine if the index content are stored within NodeStore
> itself.
>
> However if the index are stored externally e.g. Solr or Lucene index
> with persistence set to filesystem then I think currently we do not
> the remove the existing index data which might lead to index
> containing stale data.
>
> Should we provide any sort of callback for indexers when reindex is
> requested?
>
> Chetan Mehrotra
>

Reply via email to