[ https://issues.apache.org/jira/browse/JENA-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200273#comment-17200273 ]
Claus Stadler edited comment on JENA-1894 at 9/22/20, 6:01 PM: --------------------------------------------------------------- I should add that I had a discussion with the main author Alexander Bigerl today and so far he is fine if I turned this into a PR for jena - provided that he gets cited :) The main questions are (.) is there interest for this? (.) if so, where should it go - the engine itself should probably go to a separate module ; the storage system has also grown quite large; but it might still fit into dboe was (Author: aklakan): I should add that I had a discussion with the main author Alexander Bigerl today and so far he is fine if I turned this into a PR for jena - provided that he gets cited :) > Insert-order preserving dataset > ------------------------------- > > Key: JENA-1894 > URL: https://issues.apache.org/jira/browse/JENA-1894 > Project: Apache Jena > Issue Type: Improvement > Components: ARQ > Affects Versions: Jena 3.14.0 > Reporter: Claus Stadler > Priority: Major > > To the best of my knowledge, there is no backend for datasets that retains > insert order. > This feature is particularly useful when changing RDF files in a git > repository, as it makes for nice commits. An insert-order preserving > Triple/QuadTable implementation enables: > * Writing (subject-grouped) RDF files or events from an RDF stream out in > nearly the same way they were read in - this makes it easier to compare > outputs of data transformations > * Combining ORDER BY with CONSTRUCT queries: > {code:java} > Dataset ds = DatasetFactory.createOrderPreservingDataset(); > QueryExecutionFactory.create("CONSTRUCT WHERE { ?s ?p ?o } ORDER BY ?s ?p > ?o", ds); > RDFDataMgr.write(System.out, ds, RDFFormat.TURTLE_BLOCKS); > {code} > I have created an implementation for this some time ago with the main classes > of the machinery being: > * > [QuadTableFromNestedMaps.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/QuadTableFromNestedMaps.java#L26] > * In addition, I created a lazy (but adequate?) wrapper for re-using a quad > table as a triple table: > > [TripleTableFromQuadTable.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/TripleTableFromQuadTable.java#L30] > * The DatasetGraph wapper: > > [DatasetGraphQuadsImpl.java|https://github.com/SmartDataAnalytics/jena-sparql-api/blob/a18b069e963bdef6cc9e8915f3e8f766893bab15/jena-sparql-api-rx/src/main/java/org/aksw/jena_sparql_api/rx/DatasetGraphQuadsImpl.java#L32] > The actual factory code then uses: > {code:java} > public static DatasetGraph createOrderPreservingDatasetGraph() { > QuadTable quadTable = new QuadTableFromNestedMaps(); > TripleTable tripleTable = new TripleTableFromQuadTable(quadTable); > DatasetGraph result = new DatasetGraphInMemory(quadTable, > tripleTable); > return result; > } > {code} > Note, that DatasetGraphQuadsImpl at present falsly claims that it is > transaction aware - because otherwise any SPARQL insert caused an exception > (I have not tried with the latest fixes for 3.15.0-SNAPSHOT yet). In any > case, for the use cases of writing out RDF transactions may not even be > necessary, but if there is an easy way to add them, then it should be done. > An example of the above code in action is here: [Git Diff based on ordered > turtle-blocks output > |https://github.com/SmartDataAnalytics/lodservatory/commit/ec50cd33230a771c557c1ed2751799401ea3fd89] > The downside of using this kind of order preserving dataset is, that > essentially it only features an gspo index. Hence, the performance > characteristics of this kind of order preserving dataset - which is intended > mostly for serialization or presentation - varies greatly form the > query-optimized implementations. > In any case, order preserving datasets are a highly useful feature for Jena > and I'd gladly contribute a PR for that. My main questions are: > * How to call the factory methods in DatasetFactory, DatasetGraphFactory etc > - createOrderPreservingDataset? > * In the approach using QuadTableFromNestedMaps needed - or can a different > implementation of QuadTable be repurposed? > * It seems that the abstract class DatasetGraphQuads does not have any > implementation at least in ARQ and the jena modules I use (according to > eclipse) - so my custom implementation of DatasetGraphQuadsImpl seems to be > needed, or is there a similar class lying around in another jena package? -- This message was sent by Atlassian Jira (v8.3.4#803005)