Hi,

On Wed, Mar 6, 2013 at 5:33 PM, Thomas Mueller <muel...@adobe.com> wrote:
> I wonder what is the state of the implementation of merge operations for
> segments and journals, and how are merges scheduled?

Basic merging functionality for the in-memory segment store there, see
the o.a.j.oak.plugins.segment.JournalTest for some examples on how
that works. Currently the merge operation needs to be explicitly
triggered.

Before implementing the same mechanism for the MongoDB store, I've
been experimenting with a few alternatives on how to schedule and
process merges. The most straightforward approach is to use periodic
merges with a configurable interval (defaulting to something like one
second), with the option to explicitly trigger extra merges when
needed, and to process merges using our existing rebase logic that
leaves conflict markers in the tree when unresolvable conflicts are
detected.

However, as noted in OAK-633, there are a few conceptual problems with
this approach to processing merges:

a) Since validators and other commit hooks are not run during the
merge, the result can be an internally inconsistent content tree
(dangling references, incorrect permission store, etc.)

b) The presence of conflict markers will prevent further changes to
affected nodes until the conflict gets resolved

c) There's no good way to handle more than one set of conflicts per node

So, apart from problem a (which also affects the new MongoMK), the
current mechanism works fine (i.e. fully parallel writes) as long as
the changes are non-conflicting, but runs into trouble when there are
conflicts.

So far I've come up with the following alternative designs to address
the above problems:

* Use a more aggressive merge algorithm that automatically resolves
all conflicts by throwing away (or storing somewhere else) "less
important" changes when needed. Addresses problems b and c, problem a
still an issue.

* Instead of merging the full set of changes from another journal, we
could keep track of what the Oak client saved vs. what actually got
committed (after hook processing) and then during a merge try to
replay just those client changes before re-running the commit hooks. A
change set that fails because of merge conflicts or validation issues
could be moved to a separate conflict queue for later (possibly
manual) processing. This approach would solve all the above problems,
but could cause some surprises as the repository might occasionally
"undo" commits that previously passed without problems.

Given these tradeoffs I'm thinking that a default SegmentMK deployment
should start with just the root journal, and other journals (and their
merge behavior) should be configured depending on the requirements and
characteristics of particular deployment scenarios. We need to come up
with some better distributed test cases to determine what such
scenarios would look like and what the best journal configurations for
them would be.

BR,

Jukka Zitting

Reply via email to