Re: RDFStream to RDFConnection

Andy Seaborne Tue, 09 Jul 2019 03:10:08 -0700

Claude,

How many triples does processing one XML document produce? There seemto be several ways to get a batching/buffering effect including currentcode. e.g send the StreamRDF to a graph, then send the graph over theRDFConnection?

One of the nuisances of HTTP is the need to have payloads that arecorrect for both request and response. Otherwise streaming direct tothe Fuseki server would be nice but it needs to allow for request-sideabort. In fact, if you do a GSP requests and stream the body and therequest has a parse error it will abort but forcing a parse errorbecause the request side found a higher level condition that means itwants to stop (e.g. the user presses cancel) is pretty ugly.

For SPARQL 1.2, I've suggested developing websockets protocol so thatinteractions with the server can be more sophisticated but that's a longway off yet.


    Andy

On 08/07/2019 17:56, Claude Warren wrote:

The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f <aj...@apache.org> wrote:

This "replay" buffer approach was the direction I first went in for TIM,
until turning to MVCC (speaking of MVCC, that code is probably somewhere,
since we don't squash when we merge). Looking back, one thing that helped
me move on was the potential effect of very large transactions. But in a
controlled situation like Claude's, that problem wouldn't arise.

ajs6f

On Jul 8, 2019, at 11:07 AM, Andy Seaborne <a...@apache.org> wrote:

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF

additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as

BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
buffer and underlying DatasetGraph behave correctly (find* works and has
the right cardinality of results). Its a bit fiddley to get it all right
but once it works it is a building block that has a lot of re-usability.


I came across this with the SHACL work for a BufferingGraph (with

prefixes) give "abort" of transactions to simple graphs which aren't
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

    Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically,

the

stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.
Internally I cache the stream in a dataset.  I write triples to the

default

dataset and quads as appropriate.
I have a couple of questions:
1) In this arrangement what does the "base" tell me? I currently ignore

it

and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are

concatenated).


Its not necessary because the data stream should have resolved IRIs in

it so base is used in a stream.

2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the

dataset

in any way.  I didn't see any method to do so and don't really think it

is

needed.  Does anyone see a problem with this?
3) Does anyone have a use for this class?  If so I am happy to

contribute

it, though the next question becomes what module to put it in?  Perhaps

we

should have an extras package for RDFStream implementations?
Claude

Re: RDFStream to RDFConnection

Reply via email to