Re: Asynchronous indexing consistency

Jukka Zitting Wed, 29 May 2013 04:53:33 -0700

Hi,

On Wed, May 29, 2013 at 12:28 PM, Thomas Mueller <muel...@adobe.com> wrote:
> What would happen if you stop a cluster node for a long time (for example
> 1 day)? Would async indexing be done on another cluster node? If yes, I
> guess we need a way to ensure that's the case. If not, then the problem
> might be that old revisions are no longer available.


There could be various reasons for why an indexer might not be
available for an extended amount of time, so I think in any case we
need some mechanism for it to pick up from where it left. As you
mentioned, journaled observation will need some similar mechanism.

I see at least the following options:

a) If, like in the Segment and H2 MKs, we could rely on the MKs
supporting cheap copies and diffs across subtrees, we could implement
this without API changes by keeping a copy of the last indexed/seen
state of the repository in a hidden subtree. The indexer would refresh
this copy on each index update, and could thus always know what
content has already been indexed. Unfortunately there probably isn't
any easy way to do this in the MongoMK.

b) Have some way to mark specific revisions as ones that should be
kept around for a longer time (e.g. using a lease mechanism). The
indexer could then store such a revision id as a part of an index
update as a record of what content was last indexed.

c) Keep a log of all changes since the last index update. This is
probably the least attractive solution as it adds quite a bit of write
overhead and, unless the log is maintained by the MK, we'd still have
to worry about potential lost updates.

BR,

Jukka Zitting

Re: Asynchronous indexing consistency

Reply via email to