Re: Why DatasetGraphInMemory?

Arne Bernhardt Mon, 22 May 2023 03:00:10 -0700

Hi Rob,
thanks for sharing your perspective.

The in-memory graphs are exactly what I work with most and where I might
have the skills to contribute to the project.
My idea might actually only be relevant to the --mem use case.


I have no idea about the inner workings of TBD, as we (at work) needed a
temporary database where all changes are versioned.

    Arne



Am Mo., 22. Mai 2023 um 11:07 Uhr schrieb Rob @ DNR <rve...@dotnetrdf.org>:

> Fuseki is effectively the Jena projects database server that allows
> sharing a single Jena Dataset amongst many processes and users.
>
> This means that users expect database server like behaviour, i.e.,
> transactions, read isolation which the transactional in-memory dataset
> provides, when running Fuseki in the in-memory mode.
>
> I’m not sure about the full context of that comment but I don’t think
> that’s entirely true.  It depends on how the user starts and runs Fuseki.
> Most people who want a persistent dataset would be using TDB which has its
> own completely independent Dataset implementation, query executor and
> persistent data structures.
>
> Broadly speaking users of Fuseki run it in 3 main ways:
>
>
>   *   With TDB (the --loc=/path/to/db flag)
>   *   In Memory (the --mem flag)
>   *   With a configuration file (--config flag)
>
> For 1 DatasetGraphInMemory doesn’t get used AFAIK, the TDB specific
> implementations are used instead.  For 2 it’s the default dataset.  For 3
> it will depend on what the user has placed in their configuration file and
> might be a mixture of 1 and 2 plus inference, ancillary index wrappers
> (text/geospatial indexing) etc.
>
> Again, I think you’re getting hung up on the wrong thing here. An improved
> in-memory Graph implementation will have benefits, but it won’t necessarily
> be for all use cases.  There’s plenty of use cases where you do just want
> to briefly load/generate a bunch of RDF in-memory, manipulate it and move
> on, which an improved in-memory implementation will greatly benefit.
>
> Fuseki, as a database server, likely won’t benefit (except perhaps in some
> peoples custom configuration setups).  However, people who want performance
> with Fuseki should already be using TDB anyway.
>
> Hope this helps,
>
> Rob
>
> From: Arne Bernhardt <arne.bernha...@gmail.com>
> Date: Friday, 19 May 2023 at 21:21
> To: dev@jena.apache.org <dev@jena.apache.org>
> Subject: Why DatasetGraphInMemory?
> Hi,
> in a recent  response
> <https://github.com/apache/jena/issues/1867#issuecomment-1546931793> to an
> issue it was said that   "Fuseki - uses DatasetGraphInMemory mostly"  .
> For my  PR <https://github.com/apache/jena/pull/1865>, I added a JMH
> benchmark suite to the project. So it was easy for me to compare the
> performance of GraphMem with
> "DatasetGraphFactory.createTxnMem().getDefaultGraph()".
> DatasetGraphInMemory is much slower in every discipline tested (#add,
> #delete, #contains, #find, #stream).
> Maybe my approach is too naive?
> I understand very well that the underlying Dexx Collections Framework, with
> its immutable persistent data structures, makes threading and transaction
> handling easy and that there are no issues with consuming iterators or
> streams even after a read transaction has closed.
> Is it currently supported for consumers to use iterators and streams after
> a transaction has been closed? If so, I don't currently see an easy way to
> replace DatasetGraphInMemory with a faster implementation. (although
> transaction-aware iterators that copy the remaining elements into lists
> could be an option).
> Are there other reasons why DatasetGraphInMemory is the preferred dataset
> implementation for Fuseki?
>
> Cheers,
> Arne
>

Re: Why DatasetGraphInMemory?

Reply via email to