rvesse commented on issue #1633:
URL: https://github.com/apache/jena/issues/1633#issuecomment-1326251895
So having done this for a previous employers CLI tools for their Graph
Database that used Jena for the user facing pieces I can say that this is
non-trivial to achieve.
That's not to say that is isn't possible merely to highlight that there are
a few things to be aware of if someone wanted to attempt this:
1. You likely want to make this an opt-in behaviour **NOT** change the
existing default behaviour
- A streaming construct won't suppress duplicate triples so you could
get much larger output than expected
- If the consumer of the output doesn't cope with duplicate triples
properly this can break larger data pipelines
2. If a user opts into this behaviour you need to validate that their
selected output format is compatible with streaming.
- Jena has streaming writers for some languages but not all languages
(and this includes some that in theory could have a streaming writer but it
would be horrendously verbose e.g. RDF/XML)
- See `WriterStreamRDFPlain` (for NTriples/Turtle),
`WriterStreamRDFBlocks` (for Turtle with limited syntactic sugar),
`StreamRDF2Thrift` and `StreamRDF2Protobuf`
- Also worth noting that streaming writers will inherently produce less
compressed output, i.e. they can't use all the syntactic sugar of their
languages e.g. Turtle predicate object lists, collection shorthands etc,
because those require multiple passes over the full data to compute whether
those are usable
- I don't remember if there is a registry for streaming writers (I
remember having to hardcode an `if` structure for this at the time but that was
~8 years ago now), there might be one now (@afs does that exist now?) or it may
need introducing
- You'll need to propagate the query namespace prefixes to the streaming
writer somehow since you'll be operating with an `Iterator<Triple>` that won't
have any prefixes available unlike the `Model` you get from a normal construct
evaluation
3. Then depending on whether you can use a streaming writer or not invoke
the relevant `execConstruct()` vs `execConstructTriples()` methods and handle
the result accordingly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]