Fuseki is effectively the Jena projects database server that allows sharing a single Jena Dataset amongst many processes and users.
This means that users expect database server like behaviour, i.e., transactions, read isolation which the transactional in-memory dataset provides, when running Fuseki in the in-memory mode. I’m not sure about the full context of that comment but I don’t think that’s entirely true. It depends on how the user starts and runs Fuseki. Most people who want a persistent dataset would be using TDB which has its own completely independent Dataset implementation, query executor and persistent data structures. Broadly speaking users of Fuseki run it in 3 main ways: * With TDB (the --loc=/path/to/db flag) * In Memory (the --mem flag) * With a configuration file (--config flag) For 1 DatasetGraphInMemory doesn’t get used AFAIK, the TDB specific implementations are used instead. For 2 it’s the default dataset. For 3 it will depend on what the user has placed in their configuration file and might be a mixture of 1 and 2 plus inference, ancillary index wrappers (text/geospatial indexing) etc. Again, I think you’re getting hung up on the wrong thing here. An improved in-memory Graph implementation will have benefits, but it won’t necessarily be for all use cases. There’s plenty of use cases where you do just want to briefly load/generate a bunch of RDF in-memory, manipulate it and move on, which an improved in-memory implementation will greatly benefit. Fuseki, as a database server, likely won’t benefit (except perhaps in some peoples custom configuration setups). However, people who want performance with Fuseki should already be using TDB anyway. Hope this helps, Rob From: Arne Bernhardt <arne.bernha...@gmail.com> Date: Friday, 19 May 2023 at 21:21 To: dev@jena.apache.org <dev@jena.apache.org> Subject: Why DatasetGraphInMemory? Hi, in a recent response <https://github.com/apache/jena/issues/1867#issuecomment-1546931793> to an issue it was said that "Fuseki - uses DatasetGraphInMemory mostly" . For my PR <https://github.com/apache/jena/pull/1865>, I added a JMH benchmark suite to the project. So it was easy for me to compare the performance of GraphMem with "DatasetGraphFactory.createTxnMem().getDefaultGraph()". DatasetGraphInMemory is much slower in every discipline tested (#add, #delete, #contains, #find, #stream). Maybe my approach is too naive? I understand very well that the underlying Dexx Collections Framework, with its immutable persistent data structures, makes threading and transaction handling easy and that there are no issues with consuming iterators or streams even after a read transaction has closed. Is it currently supported for consumers to use iterators and streams after a transaction has been closed? If so, I don't currently see an easy way to replace DatasetGraphInMemory with a faster implementation. (although transaction-aware iterators that copy the remaining elements into lists could be an option). Are there other reasons why DatasetGraphInMemory is the preferred dataset implementation for Fuseki? Cheers, Arne