Hi, On Wed, Mar 6, 2013 at 5:33 PM, Thomas Mueller <muel...@adobe.com> wrote: > I wonder what is the state of the implementation of merge operations for > segments and journals, and how are merges scheduled?
Basic merging functionality for the in-memory segment store there, see the o.a.j.oak.plugins.segment.JournalTest for some examples on how that works. Currently the merge operation needs to be explicitly triggered. Before implementing the same mechanism for the MongoDB store, I've been experimenting with a few alternatives on how to schedule and process merges. The most straightforward approach is to use periodic merges with a configurable interval (defaulting to something like one second), with the option to explicitly trigger extra merges when needed, and to process merges using our existing rebase logic that leaves conflict markers in the tree when unresolvable conflicts are detected. However, as noted in OAK-633, there are a few conceptual problems with this approach to processing merges: a) Since validators and other commit hooks are not run during the merge, the result can be an internally inconsistent content tree (dangling references, incorrect permission store, etc.) b) The presence of conflict markers will prevent further changes to affected nodes until the conflict gets resolved c) There's no good way to handle more than one set of conflicts per node So, apart from problem a (which also affects the new MongoMK), the current mechanism works fine (i.e. fully parallel writes) as long as the changes are non-conflicting, but runs into trouble when there are conflicts. So far I've come up with the following alternative designs to address the above problems: * Use a more aggressive merge algorithm that automatically resolves all conflicts by throwing away (or storing somewhere else) "less important" changes when needed. Addresses problems b and c, problem a still an issue. * Instead of merging the full set of changes from another journal, we could keep track of what the Oak client saved vs. what actually got committed (after hook processing) and then during a merge try to replay just those client changes before re-running the commit hooks. A change set that fails because of merge conflicts or validation issues could be moved to a separate conflict queue for later (possibly manual) processing. This approach would solve all the above problems, but could cause some surprises as the repository might occasionally "undo" commits that previously passed without problems. Given these tradeoffs I'm thinking that a default SegmentMK deployment should start with just the root journal, and other journals (and their merge behavior) should be configured depending on the requirements and characteristics of particular deployment scenarios. We need to come up with some better distributed test cases to determine what such scenarios would look like and what the best journal configurations for them would be. BR, Jukka Zitting