> On Jan 2, 2016, at 1:59 PM, Andy Seaborne <[email protected]> wrote: > On 31/12/15 15:07, A. Soroka wrote: >> >> On another note, I’m taking a bash at the “lock-per-named-graph” dataset. >> Hopefully I’ll have something soonish, that can be run in harness to see if >> it really offers useful gains in the use cases for which I hope it will. If >> it works, then maybe it would be worth abstracting to the case of arbitrary >> partitions of a dataset. > > Decentralise! No hard dependency on codebase changes - a separate > implementation to evolve and test out without the other evolution needs of > the codebase to get in the way.
I’m trying to draft a completely independent DatasetGraph implementation, which I should think would meet this criterion, although I have run into a similar problem as Claude outlined in a recent message: the current Lock, Transactional and TransactionalComponent types aren’t really set up to contemplate taking a lock or opening a transaction with some particular scope in the data, so new types have to be introduced to support that. Not a huge deal, though. I’m looking forward to seeing what Claude comes up with, because he is working on a more general problem. > See also Claude's message on locking. Yes, the efforts clearly connect. I’m hopeful that “writer-per-graph” turns out to be a special case of Claude’s idea (with a lock-region-pattern of <g ANY ANY ANY>). I’m also somewhat hopeful that we can analyze some of the problems in the general idea by building up from simpler patterns (e.g. lock regions that partition the data). >> Did you want me to look at this: >> https://issues.apache.org/jira/browse/JENA-1084 ? > That would be great. Happy to: please assign it to me. (I can’t self-assign in Jira.) >> I was thinking that I should be able to reuse the current >> TripleStore/TripleBunch machinery underneath the TripleTable and QuadTable >> interfaces, or possibly just try a very simple ConcurrentHashMap setup. > Personally, I would not use TripleBunch for such a dataset implementation as > first choice. > <snipped> > A really good thing to learn from JENA-1084 is the cost of the persistent > datastructures. Same framework, different index maps gives the most > realistic results, using TripleBunch is covered by "dataset general”. Okay, I’ll plug in some basic java.util Maps and see what we get! > One nice part of TripleBunch is the switch from small lists to maps as size > grows. (The comments in some places are wrong - they say it switches at 4 > but the impl is switch at 9 :-) https://issues.apache.org/jira/browse/JENA-1109 :) > Do you think that there is an equivalent idea in TxnMem? My guess is that > the answer is "no" because the base maps aren't hash maps which is the space > overhead cost that small lists is trying to avoid. Well, the library we are now using for persistent data structures (https://github.com/andrewoma/dexx) does provide fairly cheap persistent lists, but I suspect that the cost at the “breakover” to go to a map might be high. It might be worth looking at, though. There is an intriguing possibility to work on an idea that Clojure provides as “transient” data structures. The idea is that within the remit of a given thread (really for us, a transaction, but Clojure, for obvious reasons, isn’t going to speak in that language about basic data structures) it could be possible to mutate-in-place a normally persistent data structure with cost savings in time and space. The current design of TxnMem could take advantage of this without much effort, but the library doesn’t offer that feature. I did try Clojure’s structures, but they weren’t as performant as Dexx in the simple tests we did and the use of Clojure libraries from Java is… not pretty (maintenance headaches ahead!). However, we might get to the point that specialized persistent data structure implementations for Jena would be worthwhile, and the “transient” idea should definitely be part of that. > In terms of simplification, an eye on swapping (in the far future) to graphs > being a special datasets (with transactions!) i.e. just the default graph is > an interesting possibility to create a smaller codebase. That's my current > best guess for unifying transactions but there are various precursors. I think that sounds great. Maybe we can take a step there by making those Model and Graph impls that wrap “pieces” of datasets respect the transactionality of those datasets. > But the immediate thing I'm getting round to is finishing the TxnMem work yet > - Fuseki integration is missing and something that needs user testing well > before a release. Yes, definitely. I feel like that’s on me to carry through but I haven’t looked at Fuseki at all much. Would you like to file a ticket on me for that? Or have you already started that? > Random thought: Fuseki is the way to test various impls - we could build a > kit (low threshold to use) and ask people to report figures for different > environments. A “kit" meaning something like a specially-advertised one-off release of the Fuseki download that includes the new stuff? That sounds like a great way to lower the bar to getting feedback, and a great technique in general to advertise new features and build up interest. --- A. Soroka The University of Virginia Library
