Hi Rob, thanks for sharing your perspective. The in-memory graphs are exactly what I work with most and where I might have the skills to contribute to the project. My idea might actually only be relevant to the --mem use case.
I have no idea about the inner workings of TBD, as we (at work) needed a temporary database where all changes are versioned. Arne Am Mo., 22. Mai 2023 um 11:07 Uhr schrieb Rob @ DNR <rve...@dotnetrdf.org>: > Fuseki is effectively the Jena projects database server that allows > sharing a single Jena Dataset amongst many processes and users. > > This means that users expect database server like behaviour, i.e., > transactions, read isolation which the transactional in-memory dataset > provides, when running Fuseki in the in-memory mode. > > I’m not sure about the full context of that comment but I don’t think > that’s entirely true. It depends on how the user starts and runs Fuseki. > Most people who want a persistent dataset would be using TDB which has its > own completely independent Dataset implementation, query executor and > persistent data structures. > > Broadly speaking users of Fuseki run it in 3 main ways: > > > * With TDB (the --loc=/path/to/db flag) > * In Memory (the --mem flag) > * With a configuration file (--config flag) > > For 1 DatasetGraphInMemory doesn’t get used AFAIK, the TDB specific > implementations are used instead. For 2 it’s the default dataset. For 3 > it will depend on what the user has placed in their configuration file and > might be a mixture of 1 and 2 plus inference, ancillary index wrappers > (text/geospatial indexing) etc. > > Again, I think you’re getting hung up on the wrong thing here. An improved > in-memory Graph implementation will have benefits, but it won’t necessarily > be for all use cases. There’s plenty of use cases where you do just want > to briefly load/generate a bunch of RDF in-memory, manipulate it and move > on, which an improved in-memory implementation will greatly benefit. > > Fuseki, as a database server, likely won’t benefit (except perhaps in some > peoples custom configuration setups). However, people who want performance > with Fuseki should already be using TDB anyway. > > Hope this helps, > > Rob > > From: Arne Bernhardt <arne.bernha...@gmail.com> > Date: Friday, 19 May 2023 at 21:21 > To: dev@jena.apache.org <dev@jena.apache.org> > Subject: Why DatasetGraphInMemory? > Hi, > in a recent response > <https://github.com/apache/jena/issues/1867#issuecomment-1546931793> to an > issue it was said that "Fuseki - uses DatasetGraphInMemory mostly" . > For my PR <https://github.com/apache/jena/pull/1865>, I added a JMH > benchmark suite to the project. So it was easy for me to compare the > performance of GraphMem with > "DatasetGraphFactory.createTxnMem().getDefaultGraph()". > DatasetGraphInMemory is much slower in every discipline tested (#add, > #delete, #contains, #find, #stream). > Maybe my approach is too naive? > I understand very well that the underlying Dexx Collections Framework, with > its immutable persistent data structures, makes threading and transaction > handling easy and that there are no issues with consuming iterators or > streams even after a read transaction has closed. > Is it currently supported for consumers to use iterators and streams after > a transaction has been closed? If so, I don't currently see an easy way to > replace DatasetGraphInMemory with a faster implementation. (although > transaction-aware iterators that copy the remaining elements into lists > could be an option). > Are there other reasons why DatasetGraphInMemory is the preferred dataset > implementation for Fuseki? > > Cheers, > Arne >