Hi, Let's discuss partitioning / sharding in another thread. Asynchronous change merging is not about how to manage huge repositories (for that you need partitioning / sharding), it's about how to manage cluster nodes that are relatively far apart. I'm not sure if this is the default use case for Jackrabbit. Traditionally, asynchronous change merging (synchronizing) is only used if the subsystems are offline for some time, or if there is a noticeable networking delay between them, for example if cluster nodes are in different countries.
But I don't want that "the network is the new disk" (in terms of bottleneck, in terms of performance problem). Networking delay may be the bottleneck even if cluster nodes are in the same room, specially when you keep the whole repository in memory, or use SSDs. Also, computers get more and more cores, and at some point message passing is more efficient than locking. Asynchronous operation is bad for reservation systems, banking applications, or if you can't guarantee sticky sessions. Here you need synchronous operations or at least locking. If you want to support both cases in the same repository, you could use virtual repositories (which are also good for partitioning / sharding). My proposal is for Jackrabbit 3 only. In the extreme case, the "asynchronous change merger" might very well be a separate thread and use little more than the JCR API. Therefore asynchronous change merging should have very little or no effect on performance if it is not used. On the other hand, replication should likely be in the persistence layer. I think the persistence API should be synchronous as it is now. > We could also use normal UUIDs or SHA1 hashes of the serialized change sets That's an option, but lookup by node id and time must be efficient. UUIDs / secure hashes are not that space efficient (that might not be the problem). We see from Jackrabbit that indexing random data (UUIDs) is extremely bad for cache locality and index efficiency, but if indexing is done by time then that's also not a problem. The algorithm I propose is sensitive to configuration changes, but you only need to change the formula when going from "max 256 cluster nodes" to "more than 256 cluster nodes" (for example). And you need a unique cluster id. But I don't think that's the problem. > we could leverage a virtual time algorithm I read the paper, but I don't actually understand how to implement it. > We'll probably need some mechanism for making the content > of conflicting changes available for clients to review event if the > merge algorithm chooses to discard them. If we leave it up to the client to decide what to do, then things might more easily run out of sync. But in any case there might be problems, for example synchronous event listeners might get a different order of events in different cluster nodes (possibly even different events). Probably it would make sense to add some kind of offline comparison / sync feature, similar to rsync. Actually that could be useful even for Jackrabbit 2. Regards, Thomas