Re: per-graph locking for dataset?

A. Soroka Thu, 10 Dec 2015 09:51:35 -0800

On Dec 10, 2015, at 8:29 AM, Andy Seaborne <[email protected]> wrote:
> On 09/12/15 13:22, A. Soroka wrote:
>> So I’ve been casting about for some simple ways to deliver cheap and very 
>> limited forms of MRMW that might provide some “bang for the buck” for Jena 
>> users. One to which I keep coming back is the idea of a dataset with 
>> per-graph transaction locking.
> 
> That might combine well with using the Graph Store Protocol for graph 
> management where default union graph is common.
> Another locking strategy is "graph node" i.e. a subject-in-a-graph and all 
> its in and out links.  Then it's MW when working on different parts of a 
> graph.  Different characteristics.


I’ve thought about this and variations on it, and read some papers by people 
who have developed other kinds of triplestore locking schemes. For simplicity 
I’m trying to understand the per-named-graph approach in the lattice of 
partitions of the dataset, if we think about the blocks of a partition as each 
taking an MR+SW lock. The trivial/greatest partition is the MR+SW dataset we 
are now introducing in 3.0.1. The finest/least partition is locking per-triple. 
I’m trying to find a scheme in the middle that is a reasonable “iteration 
forward” from what’s been done for 624 so far, reasonably performant, 
reasonably useful, etc.

The “graph node” approach doesn’t partition the dataset, but it is obviously of 
a finer/more sophisticated granularity than the per-named-graph approach, so 
could be superior in at least that way for at least some cases. But an approach 
that partitions the dataset is attractive to me because it can lend a natural 
form to the implementation (depending on how the partition is chosen) and is 
easy to reason about for application developers.

>> In other words, a dataset wherein each named graph has a SW lock and the 
>> dataset as a whole can support MRMW, as long as every writer is working in a 
>> different graph.
> What isolation levels do you want to support?  With MW, this now matters.

I think we might have to go for snapshot isolation again for some of the 
reasons you gave, but I’m still trying to reason about this and also sniff out 
other issues. For example, I don’t understand lock interaction very well yet. 
Suppose Transaction1 acquires a W lock on Graph1 and a R lock on Graph2. Then 
Transaction2 commits a W to Graph2. I suppose Transaction1 should now fail? The 
fact that it held the two locks together implies that the mutations to Graph1 
could have had some dependency on triples in Graph2 which have changed.

> That's not to say it's a bad idea - but that what "it" is matters.

Yes, the first email was just to start a conversation about whether this level 
of granularity meets the tests of “reasonableness”. The devil is in the 
details, but it is starting to sound to me like it might be worth nailing down 
some of those details.

>> Some examples of where this could be helpful:
>> 
>> * Linked Data persistence backends in which the graph about each
>> HTTP-accessible resource is stored in a separate named graph.
> 
> SPARQL graph Store Protocol.

Meaning that GSP is a good use case?

>> * Fast loading of quads from a sorted file (using multiple cursors).
> 
> Different problem?  That's threading inside a single SW?

Yes, perhaps it is. I was just “brainstorming”. I think there are other cases, 
though. Multi-tenancy comes to mind.

---
A. Soroka
The University of Virginia Library

Re: per-graph locking for dataset?

Reply via email to