Re: RDFStream to RDFConnection

Andy Seaborne Wed, 10 Jul 2019 12:20:54 -0700

How big is it one file?  A module, even under jena-extras seems a tad heavy.


Stepping back from the specifics, thinking this might be one of several:

Is this more of an example of how to to do something? That could be doneby publishing the source, still with the Apache legal framework.

We have jena-examples, package org/apache/jena/example/ and that getsinto the release source.


Maybe that's a way without too much ceremony.

Or more a "documentation" via the web-site
Or the cwiki?

    Andy

On 09/07/2019 10:43, Claude Warren wrote:

So, the question is should I go ahead and create a library of StreamRDF
implementations in the extras section?  I could see one to do serialization
over Kafka (or other queue implementations)?

On Mon, Jul 8, 2019 at 5:56 PM Claude Warren <cla...@xenei.com> wrote:

The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f <aj...@apache.org> wrote:

This "replay" buffer approach was the direction I first went in for TIM,
until turning to MVCC (speaking of MVCC, that code is probably somewhere,
since we don't squash when we merge). Looking back, one thing that helped
me move on was the potential effect of very large transactions. But in a
controlled situation like Claude's, that problem wouldn't arise.

ajs6f

On Jul 8, 2019, at 11:07 AM, Andy Seaborne <a...@apache.org> wrote:

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF

additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as

BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
buffer and underlying DatasetGraph behave correctly (find* works and has
the right cardinality of results). Its a bit fiddley to get it all right
but once it works it is a building block that has a lot of re-usability.


I came across this with the SHACL work for a BufferingGraph (with

prefixes) give "abort" of transactions to simple graphs which aren't
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

    Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically,

the

stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in

the

cache to the RDFConnection.
Internally I cache the stream in a dataset.  I write triples to the

default

dataset and quads as appropriate.
I have a couple of questions:
1) In this arrangement what does the "base" tell me? I currently

ignore it

and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are

concatenated).


Its not necessary because the data stream should have resolved IRIs in

it so base is used in a stream.

2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the

dataset

in any way.  I didn't see any method to do so and don't really think

it is

needed.  Does anyone see a problem with this?
3) Does anyone have a use for this class?  If so I am happy to

contribute

it, though the next question becomes what module to put it in?

Perhaps we

should have an extras package for RDFStream implementations?
Claude


--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: RDFStream to RDFConnection

Reply via email to