[
https://issues.apache.org/jira/browse/JENA-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194957#comment-17194957
]
Claus Stadler edited comment on JENA-1894 at 9/13/20, 10:07 AM:
----------------------------------------------------------------
Update: I had success towards generalizing the initial nested map
implementation to an arbitrary in-memory tuple index.
Nested storage (store? what is the appropriate terminology) structures can now
be constructed using the following pattern:
{code:java}
TupleAccessor<Quad, Node> accessor = new TupleAccessorQuad();
StorageManager<Quad, Node, Map<Node, Map<Node, Map<Node, Map<Node, Quad>>>>>
storage =
StorageComposers.innerMap(3, LinkedHashMap::new,
StorageComposers.innerMap(0, LinkedHashMap::new,
StorageComposers.innerMap(1, LinkedHashMap::new,
StorageComposers.leafMap(2, accessor,
LinkedHashMap::new))));
Map<Node, Map<Node, Map<Node, Map<Node, Quad>>>> rootTyped = storage.newStore();
Object root = rootTyped;
Quad q1 = SSE.parseQuad("(:g1 :s1 :g1p1 :g1o1)");
storage.add(root, q1);
storage.remove(root, q1);
{code}
I am not sure about the naming of 'StorageManager' but its purpose is
* to create actual storage instances according to its specification - in the
code above this is all strongly typed, but the insert/delete/lookup operations
of the StorageManager would all just work with Object (and internally cast it
according to specification)
* perform inserts and deletes on it [works now]
* provide an interface to test whether a given pattern matches the keys of the
nested maps in the right order - this allows for serving distinct values
directly without having to scan [in progress]
Complete example with output of the nested structure is here:
[TestTupleTableCore.java|https://github.com/Aklakan/jena/blob/e6ae3557865011597d59d4e3963a99d52555ac0b/jena-db/jena-dboe-storage/src/test/java/org/apache/jena/dboe/storage/storage/TestTupleTableCore.java#L115]
So since there was no support for those nested (LinkedHash)Map structures I
hope I am not reimplementing existing infrastructure.
Side remark: The design was also inspired be encountering
{code}
public class FourTupleMap extends PMap<Node, ThreeTupleMap, FourTupleMap> ...
public static class ThreeTupleMap extends PMap<Node, TwoTupleMap,
ThreeTupleMap> ...
{code}
and thinking about whether there was a way to make that scale to arbitrary
tuple sizes.
was (Author: aklakan):
Update: I had success towards generalizing the initial nested map
implementation to an arbitrary in-memory tuple index.
Nested storage (store? what is the appropriate terminology) structures can now
be constructed using the following pattern:
{code:java}
TupleAccessor<Quad, Node> accessor = new TupleAccessorQuad();
StorageManager<Quad, Node, Map<Node, Map<Node, Map<Node, Map<Node, Quad>>>>>
storage =
StorageComposers.innerMap(3, LinkedHashMap::new,
StorageComposers.innerMap(0, LinkedHashMap::new,
StorageComposers.innerMap(1, LinkedHashMap::new,
StorageComposers.leafMap(2, accessor,
LinkedHashMap::new))));
Map<Node, Map<Node, Map<Node, Map<Node, Quad>>>> rootTyped = storage.newStore();
Object root = rootTyped;
Quad q1 = SSE.parseQuad("(:g1 :s1 :g1p1 :g1o1)");
storage.add(root, q1);
storage.remove(root, q1);
{code}
I am not sure about the naming of 'StorageManager' but its purpose is
* to create actual storage instances according to its specification - in the
code above this is all strongly typed, but the insert/delete/lookup operations
of the StorageManager would all just work with Object (and internally cast it
according to specification)
* perform inserts and deletes on it [works now]
* provide an interface to test whether a given pattern matches the keys of the
nested maps in the right order - this allows for serving distinct values
directly without having to scan [in progress]
Complete example with output of the nested structure is here:
[TestTupleTableCore.java|https://github.com/Aklakan/jena/blob/e6ae3557865011597d59d4e3963a99d52555ac0b/jena-db/jena-dboe-storage/src/test/java/org/apache/jena/dboe/storage/storage/TestTupleTableCore.java#L115]
So since there was no support for those nested (LinkedHash)Map structures I
hope I am not reimplementing existing infrastructure
> Insert-order preserving dataset
> -------------------------------
>
> Key: JENA-1894
> URL: https://issues.apache.org/jira/browse/JENA-1894
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 3.14.0
> Reporter: Claus Stadler
> Priority: Major
>
> To the best of my knowledge, there is no backend for datasets that retains
> insert order.
> This feature is particularly useful when changing RDF files in a git
> repository, as it makes for nice commits. An insert-order preserving
> Triple/QuadTable implementation enables:
> * Writing (subject-grouped) RDF files or events from an RDF stream out in
> nearly the same way they were read in - this makes it easier to compare
> outputs of data transformations
> * Combining ORDER BY with CONSTRUCT queries:
> {code:java}
> Dataset ds = DatasetFactory.createOrderPreservingDataset();
> QueryExecutionFactory.create("CONSTRUCT WHERE { ?s ?p ?o } ORDER BY ?s ?p
> ?o", ds);
> RDFDataMgr.write(System.out, ds, RDFFormat.TURTLE_BLOCKS);
> {code}
> I have created an implementation for this some time ago with the main classes
> of the machinery being:
> *
> [QuadTableFromNestedMaps.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/QuadTableFromNestedMaps.java#L26]
> * In addition, I created a lazy (but adequate?) wrapper for re-using a quad
> table as a triple table:
>
> [TripleTableFromQuadTable.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/TripleTableFromQuadTable.java#L30]
> * The DatasetGraph wapper:
>
> [DatasetGraphQuadsImpl.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/DatasetGraphQuadsImpl.java#L32]
> The actual factory code then uses:
> {code:java}
> public static DatasetGraph createOrderPreservingDatasetGraph() {
> QuadTable quadTable = new QuadTableFromNestedMaps();
> TripleTable tripleTable = new TripleTableFromQuadTable(quadTable);
> DatasetGraph result = new DatasetGraphInMemory(quadTable,
> tripleTable);
> return result;
> }
> {code}
> Note, that DatasetGraphQuadsImpl at present falsly claims that it is
> transaction aware - because otherwise any SPARQL insert caused an exception
> (I have not tried with the latest fixes for 3.15.0-SNAPSHOT yet). In any
> case, for the use cases of writing out RDF transactions may not even be
> necessary, but if there is an easy way to add them, then it should be done.
> An example of the above code in action is here: [Git Diff based on ordered
> turtle-blocks output
> |https://github.com/SmartDataAnalytics/lodservatory/commit/ec50cd33230a771c557c1ed2751799401ea3fd89]
> The downside of using this kind of order preserving dataset is, that
> essentially it only features an gspo index. Hence, the performance
> characteristics of this kind of order preserving dataset - which is intended
> mostly for serialization or presentation - varies greatly form the
> query-optimized implementations.
> In any case, order preserving datasets are a highly useful feature for Jena
> and I'd gladly contribute a PR for that. My main questions are:
> * How to call the factory methods in DatasetFactory, DatasetGraphFactory etc
> - createOrderPreservingDataset?
> * In the approach using QuadTableFromNestedMaps needed - or can a different
> implementation of QuadTable be repurposed?
> * It seems that the abstract class DatasetGraphQuads does not have any
> implementation at least in ARQ and the jena modules I use (according to
> eclipse) - so my custom implementation of DatasetGraphQuadsImpl seems to be
> needed, or is there a similar class lying around in another jena package?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)