Claude,

How many triples does processing one XML document produce? There seem to be several ways to get a batching/buffering effect including current code. e.g send the StreamRDF to a graph, then send the graph over the RDFConnection?

One of the nuisances of HTTP is the need to have payloads that are correct for both request and response. Otherwise streaming direct to the Fuseki server would be nice but it needs to allow for request-side abort. In fact, if you do a GSP requests and stream the body and the request has a parse error it will abort but forcing a parse error because the request side found a higher level condition that means it wants to stop (e.g. the user presses cancel) is pretty ugly.

For SPARQL 1.2, I've suggested developing websockets protocol so that interactions with the server can be more sophisticated but that's a long way off yet.

    Andy

On 08/07/2019 17:56, Claude Warren wrote:
The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f <aj...@apache.org> wrote:

This "replay" buffer approach was the direction I first went in for TIM,
until turning to MVCC (speaking of MVCC, that code is probably somewhere,
since we don't squash when we merge). Looking back, one thing that helped
me move on was the potential effect of very large transactions. But in a
controlled situation like Claude's, that problem wouldn't arise.

ajs6f

On Jul 8, 2019, at 11:07 AM, Andy Seaborne <a...@apache.org> wrote:

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF
additions though its not to an RDFConnection - it's to a patch service.

With hindsight, I wonder if that woudl have been better as
BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
buffer and underlying DatasetGraph behave correctly (find* works and has
the right cardinality of results). Its a bit fiddley to get it all right
but once it works it is a building block that has a lot of re-usability.

I came across this with the SHACL work for a BufferingGraph (with
prefixes) give "abort" of transactions to simple graphs which aren't
transactional.

But it occurs in Fuseki with complex dataset set ups like rules.

    Andy

On 08/07/2019 11:09, Claude Warren wrote:
I have written an RDFStream to RDFConnection with caching.  Basically,
the
stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.
Internally I cache the stream in a dataset.  I write triples to the
default
dataset and quads as appropriate.
I have a couple of questions:
1) In this arrangement what does the "base" tell me? I currently ignore
it
and want to make sure I havn't missed something.

The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
concatenated).

Its not necessary because the data stream should have resolved IRIs in
it so base is used in a stream.

2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the
dataset
in any way.  I didn't see any method to do so and don't really think it
is
needed.  Does anyone see a problem with this?
3) Does anyone have a use for this class?  If so I am happy to
contribute
it, though the next question becomes what module to put it in?  Perhaps
we
should have an extras package for RDFStream implementations?
Claude



Reply via email to