Hi, On Tue, Oct 19, 2010 at 12:24 PM, Thomas Müller <[email protected]> wrote: > The current Jackrabbit clustering doesn't scale well for writes > because all cluster nodes use the same persistent storage. Even if > persistence storage is clustered, the cluster journal relies on > changes being immediately visible in all nodes. That means Jackrabbit > clustering can scale well for reads, however it can't scale well for > writes. This is a property Jackrabbit clustering shares with most > clustering solutions for relational databases. Still, it would make > sense to solve this problem for Jackrabbit 3.
Agreed. The advent of the read/write web has notably increased the importance of scalable write functionality in web backends. We aren't seeing the full impact of this yet, but I know that many of our users are rolling out new sites and other applications with all sorts of commenting, tracking and social features, and that such deployments will sooner or later start hitting our current write bottleneck. > == Jackrabbit 3 Clustering == > > [Cluster Node 1] <--> [ Local Storage ] > [Cluster Node 2] <--> [ Local Storage ] I'd even like to float the idea of the local storage of each cluster node being RAM instead of a database or the file system. Instead of persisting changes to a disk, durability could be achieved by syncing the changes to at least one or two other cluster nodes. But that's probably best discussed in another thread... > == Unique Change Set Ids == > [...] > changeSetId = nanosecondsSince1970 * totalClusterNodes + clusterNodeId We could also use normal UUIDs or SHA1 hashes of the serialized change sets as these identifiers as long as we include timestamp information (and perhaps the identity of the originating cluster node) with the changes. That way you wouldn't have to make assumptions about the cluster configuration in advance. > == How to Merge Changes == > [...] > Changes with change set ids in the future are delayed. Cluster nodes > should have reasonably synchronized clocks (it doesn't need to be > completely exact, but it should be reasonably accurate, so that such > delayed events are not that common). Instead of relying on clock synchronization (many virtual servers suffer from serious clock drift), we could leverage a virtual time algorithm like the one described in [1]. [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.2620 > == Solution A: Node Granularity, Ignore Old Changes == As you mentioned, this is problematic. > == Solution B: Merge Old Changes == This sounds promising, but needs to be reviewed for all the potential conflicts. We'll probably need some mechanism for making the content of conflicting changes available for clients to review event if the merge algorithm chooses to discard them. BR, Jukka Zitting
