On Dec 10, 2015, at 8:29 AM, Andy Seaborne <[email protected]> wrote: > On 09/12/15 13:22, A. Soroka wrote: >> So I’ve been casting about for some simple ways to deliver cheap and very >> limited forms of MRMW that might provide some “bang for the buck” for Jena >> users. One to which I keep coming back is the idea of a dataset with >> per-graph transaction locking. > > That might combine well with using the Graph Store Protocol for graph > management where default union graph is common. > Another locking strategy is "graph node" i.e. a subject-in-a-graph and all > its in and out links. Then it's MW when working on different parts of a > graph. Different characteristics.
I’ve thought about this and variations on it, and read some papers by people who have developed other kinds of triplestore locking schemes. For simplicity I’m trying to understand the per-named-graph approach in the lattice of partitions of the dataset, if we think about the blocks of a partition as each taking an MR+SW lock. The trivial/greatest partition is the MR+SW dataset we are now introducing in 3.0.1. The finest/least partition is locking per-triple. I’m trying to find a scheme in the middle that is a reasonable “iteration forward” from what’s been done for 624 so far, reasonably performant, reasonably useful, etc. The “graph node” approach doesn’t partition the dataset, but it is obviously of a finer/more sophisticated granularity than the per-named-graph approach, so could be superior in at least that way for at least some cases. But an approach that partitions the dataset is attractive to me because it can lend a natural form to the implementation (depending on how the partition is chosen) and is easy to reason about for application developers. >> In other words, a dataset wherein each named graph has a SW lock and the >> dataset as a whole can support MRMW, as long as every writer is working in a >> different graph. > What isolation levels do you want to support? With MW, this now matters. I think we might have to go for snapshot isolation again for some of the reasons you gave, but I’m still trying to reason about this and also sniff out other issues. For example, I don’t understand lock interaction very well yet. Suppose Transaction1 acquires a W lock on Graph1 and a R lock on Graph2. Then Transaction2 commits a W to Graph2. I suppose Transaction1 should now fail? The fact that it held the two locks together implies that the mutations to Graph1 could have had some dependency on triples in Graph2 which have changed. > That's not to say it's a bad idea - but that what "it" is matters. Yes, the first email was just to start a conversation about whether this level of granularity meets the tests of “reasonableness”. The devil is in the details, but it is starting to sound to me like it might be worth nailing down some of those details. >> Some examples of where this could be helpful: >> >> * Linked Data persistence backends in which the graph about each >> HTTP-accessible resource is stored in a separate named graph. > > SPARQL graph Store Protocol. Meaning that GSP is a good use case? >> * Fast loading of quads from a sorted file (using multiple cursors). > > Different problem? That's threading inside a single SW? Yes, perhaps it is. I was just “brainstorming”. I think there are other cases, though. Multi-tenancy comes to mind. --- A. Soroka The University of Virginia Library
