Danny Ayers wrote:
I just read a blog post [1] from Shelley Powers in which she talks about
JSON vs XML and goes into RDF/XML vs Turtle territory a bit. Seems like
a lot of the potential XML tool interop that a 'nice' RDF/XML* might
have provided is now available through SPARQL results.
While commenting over there it occurred to me that one aspect that
doesn't seem to be available is a streaming style of access a la SAX.
Ok, this is completely off the top of my head, may well be a non-starter
- any obvious reasons it couldn't work? If not, has anyone looked
into/implemented this? Would say hooking up result iterators to
just-in-time XML generation make sense?
Did to me for XML and for JSON results formats.
The writing side is quite natural although ARQ directly outputs XML/JSON, not
using some library. The formats are so simple that using a writer library
turned out to be more effort than just doing it.
For reading, I used StAX for XML. StAX allows the application to control the
rate of processing so you get end-to-end streaming.
SAX is less good - it is event driven on the incoming side but the events
arrive at parser speed, with no control from the application software reading
the SPARQL results. To have SAX truely stream would mean putting the
application results processing inside the SAX event handling and that then
forces the rest of the APi to be very unnatural.
JSON (and it uses the org.json library) is pull-on-stream, so the SPARQL
results reader is reading in a streaming style, pulling on result row at a
time from a JSON input stream.
If the query is "SELECT * { ?s ?p ?o }" then it is streaming triples.
I'm guessing it should be
feasible but only useful with a subset of possible query patterns. Could
be sweet for performance/scale though, not to mention queries over
Jabber...
Yes - it will depend on the query processing implementation a bit but many
(most real life) SPARQL queries don't have structures that can't themslves be
totally streamed (see the Semantics of SPARQL paper for a case that can't).
Streaming query execution is good because it keeps the memory footprint down.
But. JDBC (typical default setup of the DB server) returns all results before
letting the client application start looping over the results. No streaming :-(
Andy
Cheers,
Danny.
[1]
http://burningbird.net/technology/learning-javascript/to-json-or-not-to-json
(* I suspect a genuinely nice RDF/XML is an unfindable Holy Grail -
either you'd have to ditch the XML tree-friendly striping style or the
graph-friendly statement style)
--
http://dannyayers.com