Re: Asynchronous indexing consistency

Jukka Zitting Wed, 29 May 2013 05:48:37 -0700

Hi,

On Wed, May 29, 2013 at 3:01 PM, Thomas Mueller <muel...@adobe.com> wrote:
>>There could be various reasons for why an indexer might not be
>>available for an extended amount of time
>
> Possibly you are right, but let me try to challenge this assumption:
>
> Wouldn't it be a problem if the index isn't updated for a long time?


Not necessarily. For example during large batch imports or content
migrations it might be useful to be able to speed things up by
disabling things like full text indexing. Or it could be that an
external index server like Solr is down for maintenance or other
reasons. Such cases would obviously lead to some loss of
functionality, but probably wouldn't be too troublesome if the
relevant indexers were able to automatically pick up from where they
left.

> Don't we need a protection against an outdated index?

Any asynchronous indexes will in any case need to be resilient against
some mismatch between the index and repository content. Whether that
is measured in minutes or days should be irrelevant to the index
implementation, the only impact would be on the freshness assumptions
that applications or end users might have.

>>a) If, like in the Segment and H2 MKs, we could rely on the MKs
>>supporting cheap copies and diffs across subtrees, we could implement
>>this without API changes by keeping a copy of the last indexed/seen
>>state of the repository in a hidden subtree. The indexer would refresh
>>this copy on each index update, and could thus always know what
>>content has already been indexed. Unfortunately there probably isn't
>>any easy way to do this in the MongoMK.
>
> It sounds like reading with old revisions.

Not really; let me rephrase. What I'm suggesting is something like this:

    NodeState root = branch.getHead();
    NodeState index =
        root.getChildNode("oak:index").getChildNode("someIndex");
    NodeState before = index.getNode(":before");

    NodeBuilder rootBuilder = root.builder();
    NodeBuilder indexBuilder =
        rootBuilder.getChildNode("oak:index").getChildNode("someIndex");
    root.compareAgainstBaseState(
        before, new IndexUpdate(indexBuilder.getChildNode(":index")));
    indexBuilder.setChildNode(":before", root);

    branch.setRoot(rootBuilder.getNodeState());
    branch.merge();

I.e. instead of tracking things by revision, we'd just make a full
copy of the entire content tree that has already been indexed.

Unfortunately, AFAICT, in MongoMK this would require the duplication
of the entire subtree instead of the copy by reference that the
Segment and H2 MKs could do.

BR,

Jukka Zitting

Re: Asynchronous indexing consistency

Reply via email to