[ https://issues.apache.org/jira/browse/JENA-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367114#comment-14367114 ]
Stian Soiland-Reyes commented on JENA-624: ------------------------------------------ jena-arq has https://github.com/apache/jena/blob/master/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/DatasetImpl.java https://github.com/apache/jena/blob/master/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/DatasetGraphSimpleMem.java A Dataset can be seen as a collection of named Graphs. jena-core has an in-memory Graph model here: https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/mem/GraphTripleStoreBase.java (you will have to hunt around the different class hierarchies in an IDE to get a better understanding) jena-tdb has a memory-mapped-files based model with efficient lookup for queries -- you can ask for this to run in memory-mode only (basically using a byte[] I believe) -- but this GSOC proposal is to be inspired by the efficient bits of TDB (e.g. the index trees) but create a Dataset implementation in memory that can perform well with complex SPARQL queries and larger dataset (e.g. a couple of GBs). Thus you should not need to play at byte-level, but still need to make efficient data structures (both in terms of speed and memory usage). > Develop a new in-memory RDF Dataset implementation > -------------------------------------------------- > > Key: JENA-624 > URL: https://issues.apache.org/jira/browse/JENA-624 > Project: Apache Jena > Issue Type: Improvement > Reporter: Andy Seaborne > Labels: gsoc, gsoc2015, java, linked_data, rdf > > The current (Jan 2014) Jena in-memory dataset uses a general purpose > container that works for any storage technology for graphs together with > in-memory graphs. > This project would develop a new implementation design specifically for RDF > datasets (triples and quads) and efficient SPARQL execution, for example, > using multi-core parallel operations and/or multi-version concurrent > datastructures to maximise true parallel operation. > This is a system project suitable for someone interested in datatbase > implementation, datastructure design and implementation, operating systems or > distributed systems. > Note that TDB can operate in-memory using a simulated disk with > copy-in/copy-out semantics for disk-level operations. It is for faithful > testing TDB infrastructure and is not designed performance, general in-memory > use or use at scale. While lesson may be learnt from that system, TDB > in-memory is not the answer here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)