[ 
https://issues.apache.org/jira/browse/JENA-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502348#comment-17502348
 ] 

Andy Seaborne commented on JENA-2302:
-------------------------------------

{quote}is no longer streaming{quote}

I'm not clear here - what version was streaming?

The JSON result reader has always been non-streaming at least since Jena 3.6.0  
- RowSetReaderJSON is the previous ResultSetReaderJSON ported.

In XML, the order of elements is prescribed by the XML schema. The {{<head>}} 
tag comes before {{<results>}} each tag appears once. Streaming is possible 
-StAX.

JSON offers no such guarantee. What is more, in JSON a key can appear twice; 
conventionally, the second key takes precedence.

It would be possible to parse optimistically but handing partially read 
buffered streams to streaming parser once it is known to be stream-suitable is 
not a simple matter.

Parsing the results as JSON then processing the JSON data structure is robust. 
It is also robust against partial failures.

Fuseki will write in stream order but the parser is general. As separate 
Fuseki-specific parser is possible.

The fastest stream choice for Fuseki is the binary 
{{application/sparql-results+thrift}}.


> RowSetReaderJSON is not streaming
> ---------------------------------
>
>                 Key: JENA-2302
>                 URL: https://issues.apache.org/jira/browse/JENA-2302
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 4.5.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> Retrieving all data from our TDB2 endpoint with jena 4.5.0-SNAPSHOT is no 
> longer streaming for the JSON format. I tracked the issue to RowSetReaderJson 
> which reads everything into in memory (and then checks whether it is a SPARQL 
> ASK result)
> {code:java}
> public class RowSetReaderJson {
>         private void parse(InputStream in) {
>             JsonObject obj = JSON.parse(in); // !!! Loads everything !!!
>             // Boolean?
>             if ( obj.hasKey(kBoolean) ) { ... }
>     }
> }
> {code}
> Streaming works when switching the to RS_XML in the example below:
> {code:java}
> public class Main {
>     public static void main(String[] args) {
>         System.out.println("Test Started");
>         try (QueryExecution qe = QueryExecutionHTTP.create()
>                 
> .acceptHeader(ResultSetLang.RS_JSON.getContentType().getContentTypeStr())
>                 .endpoint("http://moin.aksw.org/sparql";).queryString("SELECT 
> * { ?s ?p ?o }").build()) {
>             qe.execSelect().forEachRemaining(System.out::println);
>         }
>         System.out.println("Done");
>     }
> }
> {code}
> For completeness, I can rule out any problem with TDB2 because streaming of 
> JSON works just fine with: 
> {code:bash}
> curl --data-urlencode "query=select * { ?s ?p ?o }"  
> "http://moin.aksw.org/sparql";
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to