[
https://issues.apache.org/jira/browse/JENA-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168904#comment-13168904
]
Rob Vesse commented on JENA-178:
--------------------------------
>From my attached test case I get the following results:
Took 0.068863659s to do execSelect()
Took 9.720043614s to do outputAsXML()
Took 0.77990671s to do fromXML() and iterate over result set
Took 3.27026E-4s to do execSelect()
Took 6.075415556s to do outputAsJSON()
Took 1.291390952s to do fromJSON() and iterate over result set
Took 3.0173E-4s to do execSelect()
Took 0.382692228s to do outputAsTSV()
Took 0.821681056s to do fromTSV() and iterate over result set
So there is some HTTP overhead involved in my original figures but essentially
it it still the case that XML serialization is 20x slower than TSV and JSON
serialization is 15x slower than TSV
Parsing all these formats is pretty fast so looks to be the serializers that
are at fault
> SPARQL Results serialization and parsing is slow with large result sets
> -----------------------------------------------------------------------
>
> Key: JENA-178
> URL: https://issues.apache.org/jira/browse/JENA-178
> Project: Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: ARQ 2.8.9
> Environment: Windows 7 Enterprise 64 bit
> Reporter: Rob Vesse
> Attachments: TestArqSerializerPerformance.java
>
>
> The SPARQL XML and JSON Result formats are very slow when the result set is
> large. This is surprising to me since both formats are relatively simple and
> should lend themselves to fairly fast streaming serialization and parsing.
> The following are observed performance figures comparing SPARQL XML, SPARQL
> JSON and SPARQL TSV results format. This is the averaged time over 5 runs to
> retrieve the first 50,000 triples from the dataset with a simple SELECT *
> WHERE { ?s ?p ?o } LIMIT 50000 via a HTTP request to Fuseki and iterate over
> the results on the client.
> SPARQL XML = 15.25 seconds
> SPARQL JSON = 10.9 seconds
> SPARQL TSV = 0.54 seconds
> Now obviously TSV is way simpler to serialize and parse than XML/JSON but
> these serializers and parsers should not be 20-30 times slower IMO
> Also for comparison note that doing an equivalent CONSTRUCT { ?s ?p ?p }
> WHERE { ?s ?p ?o } LIMIT 50000 takes only about 2s and that is using RDF/XML
> serialization which I would have expected to be slower because RDF/XML is
> more complex to generate than either SPARQL XML/JSON results. I haven't
> dived into the code in detail to investigate why this is slow yet but do the
> Jena team have any thoughts on this?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira