Hi, > In various conversations on the topic of persistence I > observed that horizontal, free scalability in a cluster > for both reads and writes is a topic that we need to keep > in mind.
If you mean parallel reads / writes: Each cluster node would need its own independent storage location (multiple hard drives, or logical locations when using a scalable file system). Parallel reads are possible by replicating data to multiple cluster nodes. It's not required to always keep all data on all cluster nodes, just the data that is accessed a lot (caching). Parallel writes is harder. I'm not sure if it makes sense to support parallel writes. Maybe it could be implemented using a smart "cache invalidation algorithm" ('invalidate all nodes below node x'). Once a cluster node knows it's the only one (*) that has data for a given node, it doesn't need to replicate data or propagate changes about this node to other cluster nodes. (*) "the only one" is very bad for failover. Changes should be stored to at least two cluster nodes, but not necessarily on all cluster nodes. > Of course I also think that based on the experience with > with the current persistence model we need to make sure > that we deliver a scalable solution for all aspects > of the JCR api where it employs RangeIterators. This > includes lists of childnodes, references and the likes. I agree. Long term we should solve the current limitations. We should still optimize Jackrabbit for the most common use case (which is probably a low number of child nodes). > I would like to find out if we can take an iterative > evolutionary approach to a more efficient and more > scalable persistence. I tried once to change Jackrabbit to process node reference deltas instead of always the whole list (when adding or removing node references). My experience is that a lot of code needs to be changed to get a working solution. Also changing the code is dangerous because we don't have an extensive test suite yet. > As next steps I would like to propose that we build > an option that allows for an index of the cluster that > allows us build a journal backed persistence manager > using the current PM interface, which would essentially > have a no-op for writes. This requires that all data is stored in the change log. The change log could become a bottleneck for parallel reads, but that could be solved by caching (if it is in fact a problem). > making sure that information is only persisted once I think this is a good plan. Regards, Thomas