[
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503788#comment-17503788
]
Claus Stadler edited comment on JENA-2302 at 3/9/22, 7:00 PM:
--------------------------------------------------------------
Here the performance results (millisecond granularity) for larger data (200MB)
created with this [benchmark
runner|https://github.com/Aklakan/jena/blob/c698e61b59b8e8e7ebb0ae5341c4fed57b4b4676/jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetJSONStreamingBenchmark.java#L61]
(I will remove it from the repo when done)
Under the assumption that I didn't mess something up then the results suggest
that the streaming approach ("actual") can within a time frame process roughly
3x the amount of data compared to the non-streaming one ("expected"):
{code:bash}
Time taken for iteration0:expected:setup: 7.793s
Time taken for iteration0:expected:consumption: 0.061000004s
Time taken for iteration0:actual:setup: 0.15300001s
Time taken for iteration0:actual:consumption: 2.094s
Result sets are equal - items seen: 219441
Time taken for iteration1:expected:setup: 6.5320005s
Time taken for iteration1:expected:consumption: 0.022000002s
Time taken for iteration1:actual:setup: 0.0s
Time taken for iteration1:actual:consumption: 1.6650001s
Result sets are equal - items seen: 219441
...
Time taken for iteration20:expected:setup: 6.2060003s
Time taken for iteration20:expected:consumption: 0.012s
Time taken for iteration20:actual:setup: 0.0s
Time taken for iteration20:actual:consumption: 2.137s
Result sets are equal - items seen: 219441
...
Time taken for iteration29:expected:setup: 6.2460003s
Time taken for iteration29:expected:consumption: 0.012s
Time taken for iteration29:actual:setup: 0.0s
Time taken for iteration29:actual:consumption: 2.2870002s
Result sets are equal - items seen: 219441
{code}
was (Author: aklakan):
Here the performance results (millisecond granularity) for larger data (200MB)
created with this [benchmark
runner|https://github.com/Aklakan/jena/blob/c698e61b59b8e8e7ebb0ae5341c4fed57b4b4676/jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetJSONStreamingBenchmark.java#L61]
(I will remove it from the repo when done)
Under the assumption that I didn't mess something up then the results suggest
that the streaming approach ("actual") can within a time frame process 3x the
amount of data (or even better) compared to the non-streaming one ("expected"):
{code:bash}
Time taken for iteration0:expected:setup: 7.793s
Time taken for iteration0:expected:consumption: 0.061000004s
Time taken for iteration0:actual:setup: 0.15300001s
Time taken for iteration0:actual:consumption: 2.094s
Result sets are equal - items seen: 219441
Time taken for iteration1:expected:setup: 6.5320005s
Time taken for iteration1:expected:consumption: 0.022000002s
Time taken for iteration1:actual:setup: 0.0s
Time taken for iteration1:actual:consumption: 1.6650001s
Result sets are equal - items seen: 219441
...
Time taken for iteration20:expected:setup: 6.2060003s
Time taken for iteration20:expected:consumption: 0.012s
Time taken for iteration20:actual:setup: 0.0s
Time taken for iteration20:actual:consumption: 2.137s
Result sets are equal - items seen: 219441
...
Time taken for iteration29:expected:setup: 6.2460003s
Time taken for iteration29:expected:consumption: 0.012s
Time taken for iteration29:actual:setup: 0.0s
Time taken for iteration29:actual:consumption: 2.2870002s
Result sets are equal - items seen: 219441
{code}
> RowSetReaderJSON is not streaming
> ---------------------------------
>
> Key: JENA-2302
> URL: https://issues.apache.org/jira/browse/JENA-2302
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Affects Versions: Jena 4.5.0
> Reporter: Claus Stadler
> Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson
> which reads everything into in memory (and then checks whether it is a SPARQL
> ASK result)
> {code:java}
> public class RowSetReaderJson {
> private void parse(InputStream in) {
> JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
> // Boolean?
> if ( obj.hasKey(kBoolean) ) { ... }
> }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
> public static void main(String[] args) {
> System.out.println("Test Started");
> try (QueryExecution qe = QueryExecutionHTTP.create()
>
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
> .endpoint("http://moin.aksw.org/sparql").queryString("SELECT
> * { ?s ?p ?o }").build()) {
> qe.execSelect().forEachRemaining(System.out::println);
> }
> System.out.println("Done");
> }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of
> JSON works just fine with:
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"
> "http://moin.aksw.org/sparql"
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)