[
https://issues.apache.org/jira/browse/JENA-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367114#comment-14367114
]
Stian Soiland-Reyes commented on JENA-624:
------------------------------------------
jena-arq has
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/DatasetImpl.java
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/DatasetGraphSimpleMem.java
A Dataset can be seen as a collection of named Graphs.
jena-core has an in-memory Graph model here:
https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/mem/GraphTripleStoreBase.java
(you will have to hunt around the different class hierarchies in an IDE to get
a better understanding)
jena-tdb has a memory-mapped-files based model with efficient lookup for
queries -- you can ask for this to run in memory-mode only (basically using a
byte[] I believe) -- but this GSOC proposal is to be inspired by the efficient
bits of TDB (e.g. the index trees) but create a Dataset implementation in
memory that can perform well with complex SPARQL queries and larger dataset
(e.g. a couple of GBs). Thus you should not need to play at byte-level, but
still need to make efficient data structures (both in terms of speed and memory
usage).
> Develop a new in-memory RDF Dataset implementation
> --------------------------------------------------
>
> Key: JENA-624
> URL: https://issues.apache.org/jira/browse/JENA-624
> Project: Apache Jena
> Issue Type: Improvement
> Reporter: Andy Seaborne
> Labels: gsoc, gsoc2015, java, linked_data, rdf
>
> The current (Jan 2014) Jena in-memory dataset uses a general purpose
> container that works for any storage technology for graphs together with
> in-memory graphs.
> This project would develop a new implementation design specifically for RDF
> datasets (triples and quads) and efficient SPARQL execution, for example,
> using multi-core parallel operations and/or multi-version concurrent
> datastructures to maximise true parallel operation.
> This is a system project suitable for someone interested in datatbase
> implementation, datastructure design and implementation, operating systems or
> distributed systems.
> Note that TDB can operate in-memory using a simulated disk with
> copy-in/copy-out semantics for disk-level operations. It is for faithful
> testing TDB infrastructure and is not designed performance, general in-memory
> use or use at scale. While lesson may be learnt from that system, TDB
> in-memory is not the answer here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)