[jira] [Commented] (JENA-624) Develop a new in-memory RDF Dataset implementation

Stian Soiland-Reyes (JIRA) Wed, 18 Mar 2015 06:23:40 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367114#comment-14367114
 ]


Stian Soiland-Reyes commented on JENA-624:
------------------------------------------

jena-arq has

https://github.com/apache/jena/blob/master/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/DatasetImpl.java
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/com/hp/hpl/jena/sparql/core/DatasetGraphSimpleMem.java



A Dataset can be seen as a collection of named Graphs. 

jena-core has an in-memory Graph model here: 
https://github.com/apache/jena/blob/master/jena-core/src/main/java/com/hp/hpl/jena/mem/GraphTripleStoreBase.java


(you will have to hunt around the different class hierarchies in an IDE to get 
a better understanding)


jena-tdb has a memory-mapped-files based model with efficient lookup for 
queries -- you can ask for this to run in memory-mode only (basically using a 
byte[] I believe) -- but this GSOC proposal is to be inspired by the efficient 
bits of TDB (e.g. the index trees) but create a Dataset implementation in 
memory that can perform well with complex SPARQL queries and larger dataset 
(e.g. a couple of GBs).  Thus you should not need to play at byte-level, but 
still need to make efficient data structures (both in terms of speed and memory 
usage).


> Develop a new in-memory RDF Dataset implementation
> --------------------------------------------------
>
>                 Key: JENA-624
>                 URL: https://issues.apache.org/jira/browse/JENA-624
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Andy Seaborne
>              Labels: gsoc, gsoc2015, java, linked_data, rdf
>
> The current (Jan 2014) Jena in-memory dataset uses a general purpose 
> container that works for any storage technology for graphs together with 
> in-memory graphs.  
> This project would develop a new implementation design specifically for RDF 
> datasets (triples and quads) and efficient SPARQL execution, for example, 
> using multi-core parallel operations and/or multi-version concurrent 
> datastructures to maximise true parallel operation.
> This is a system project suitable for someone interested in datatbase 
> implementation, datastructure design and implementation, operating systems or 
> distributed systems.
> Note that TDB can operate in-memory using a simulated disk with 
> copy-in/copy-out semantics for disk-level operations.  It is for faithful 
> testing TDB infrastructure and is not designed performance, general in-memory 
> use or use at scale.  While lesson may be learnt from that system, TDB 
> in-memory is not the answer here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-624) Develop a new in-memory RDF Dataset implementation

Reply via email to