[ https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503788#comment-17503788 ]
Claus Stadler edited comment on JENA-2302 at 3/9/22, 7:00 PM: -------------------------------------------------------------- Here the performance results (millisecond granularity) for larger data (200MB) created with this [benchmark runner|https://github.com/Aklakan/jena/blob/c698e61b59b8e8e7ebb0ae5341c4fed57b4b4676/jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetJSONStreamingBenchmark.java#L61] (I will remove t from the repo when done) Under the assumption that I didn't mess something up then the results suggest that the streaming approach ("actual") can within a time frame process 3x the amount of data (or even better) compared to the non-streaming one ("expected"): {code:bash} Time taken for iteration0:expected:setup: 7.793s Time taken for iteration0:expected:consumption: 0.061000004s Time taken for iteration0:actual:setup: 0.15300001s Time taken for iteration0:actual:consumption: 2.094s Result sets are equal - items seen: 219441 Time taken for iteration1:expected:setup: 6.5320005s Time taken for iteration1:expected:consumption: 0.022000002s Time taken for iteration1:actual:setup: 0.0s Time taken for iteration1:actual:consumption: 1.6650001s Result sets are equal - items seen: 219441 ... Time taken for iteration20:expected:setup: 6.2060003s Time taken for iteration20:expected:consumption: 0.012s Time taken for iteration20:actual:setup: 0.0s Time taken for iteration20:actual:consumption: 2.137s Result sets are equal - items seen: 219441 ... Time taken for iteration29:expected:setup: 6.2460003s Time taken for iteration29:expected:consumption: 0.012s Time taken for iteration29:actual:setup: 0.0s Time taken for iteration29:actual:consumption: 2.2870002s Result sets are equal - items seen: 219441 {code} was (Author: aklakan): Here the performance results (millisecond granularity) for larger data (200MB) created with this [benchmark runner|https://github.com/Aklakan/jena/blob/c698e61b59b8e8e7ebb0ae5341c4fed57b4b4676/jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetJSONStreamingBenchmark.java#L61] (I will be removed from the repo when done) Under the assumption that I didn't mess something up then the results suggest that the streaming approach ("actual") can within a time frame process 3x the amount of data (or even better) compared to the non-streaming one ("expected"): {code:bash} Time taken for iteration0:expected:setup: 7.793s Time taken for iteration0:expected:consumption: 0.061000004s Time taken for iteration0:actual:setup: 0.15300001s Time taken for iteration0:actual:consumption: 2.094s Result sets are equal - items seen: 219441 Time taken for iteration1:expected:setup: 6.5320005s Time taken for iteration1:expected:consumption: 0.022000002s Time taken for iteration1:actual:setup: 0.0s Time taken for iteration1:actual:consumption: 1.6650001s Result sets are equal - items seen: 219441 ... Time taken for iteration20:expected:setup: 6.2060003s Time taken for iteration20:expected:consumption: 0.012s Time taken for iteration20:actual:setup: 0.0s Time taken for iteration20:actual:consumption: 2.137s Result sets are equal - items seen: 219441 ... Time taken for iteration29:expected:setup: 6.2460003s Time taken for iteration29:expected:consumption: 0.012s Time taken for iteration29:actual:setup: 0.0s Time taken for iteration29:actual:consumption: 2.2870002s Result sets are equal - items seen: 219441 {code} > RowSetReaderJSON is not streaming > --------------------------------- > > Key: JENA-2302 > URL: https://issues.apache.org/jira/browse/JENA-2302 > Project: Apache Jena > Issue Type: Improvement > Components: ARQ > Affects Versions: Jena 4.5.0 > Reporter: Claus Stadler > Priority: Major > > Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no > longer streaming for the JSON format. I tracked the issue to RowSetReaderJson > which reads everything into in memory (and then checks whether it is a SPARQL > ASK result) > {code:java} > public class RowSetReaderJson { > private void parse(InputStream in) { > JsonObject obj = JSON.parse(in); // !!! Loads everything !!! > // Boolean? > if ( obj.hasKey(kBoolean) ) { ... } > } > } > {code} > Streaming works when switching the to RS_XML in the example below: > {code:java} > public class Main { > public static void main(String[] args) { > System.out.println("Test Started"); > try (QueryExecution qe = QueryExecutionHTTP.create() > > .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr()) > .endpoint("http://moin.aksw.org/sparql").queryString("SELECT > * { ?s ?p ?o }").build()) { > qe.execSelect().forEachRemaining(System.out::println); > } > System.out.println("Done"); > } > } > {code} > For completeness, I can rule out any problem with TDB2 because streaming of > JSON works just fine with: > {code:bash} > curl --data-urlencode "query=select * { ?s ?p ?o }" > "http://moin.aksw.org/sparql" > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)