Re: RDFStream to RDFConnection

2019-07-10 Thread ajs6f
+1 to a spot in jena-examples with a write-up on our site.

ajs6f

> On Jul 10, 2019, at 3:20 PM, Andy Seaborne  wrote:
> 
> How big is it one file?  A module, even under jena-extras seems a tad heavy.
> 
> Stepping back from the specifics, thinking this might be one of several:
> 
> Is this more of an example of how to to do something? That could be done by 
> publishing the source, still with the Apache legal framework.
> 
> We have jena-examples, package org/apache/jena/example/ and that gets into 
> the release source.
> 
> Maybe that's a way without too much ceremony.
> 
> Or more a "documentation" via the web-site
> Or the cwiki?
> 
>Andy
> 
> On 09/07/2019 10:43, Claude Warren wrote:
>> So, the question is should I go ahead and create a library of StreamRDF
>> implementations in the extras section?  I could see one to do serialization
>> over Kafka (or other queue implementations)?
>> On Mon, Jul 8, 2019 at 5:56 PM Claude Warren  wrote:
>>> The case I was trying to solve was reading a largish XML document and
>>> converting it to an RDF graph.  After a few iterations I ended up writing a
>>> custom Sax parser that calls the RDFStream triple/quad methods.  But I
>>> wanted a way to update a Fuseki server so RDFConnection seemed like the
>>> natural choice.
>>> 
>>> In some recent work for my employer I found that I like the RDFConneciton
>>> as the same code can work against a local dataset or a remote one.
>>> 
>>> Claude
>>> 
>>> On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:
>>> 
>>>> This "replay" buffer approach was the direction I first went in for TIM,
>>>> until turning to MVCC (speaking of MVCC, that code is probably somewhere,
>>>> since we don't squash when we merge). Looking back, one thing that helped
>>>> me move on was the potential effect of very large transactions. But in a
>>>> controlled situation like Claude's, that problem wouldn't arise.
>>>> 
>>>> ajs6f
>>>> 
>>>>> On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
>>>>> 
>>>>> Claude,
>>>>> 
>>>>> Good timing!
>>>>> 
>>>>> This is what RDF Delta does and for updates rather than just StreamRDF
>>>> additions though its not to an RDFConnection - it's to a patch service.
>>>>> 
>>>>> With hindsight, I wonder if that woudl have been better as
>>>> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
>>>> buffer and underlying DatasetGraph behave correctly (find* works and has
>>>> the right cardinality of results). Its a bit fiddley to get it all right
>>>> but once it works it is a building block that has a lot of re-usability.
>>>>> 
>>>>> I came across this with the SHACL work for a BufferingGraph (with
>>>> prefixes) give "abort" of transactions to simple graphs which aren't
>>>> transactional.
>>>>> 
>>>>> But it occurs in Fuseki with complex dataset set ups like rules.
>>>>> 
>>>>>Andy
>>>>> 
>>>>> On 08/07/2019 11:09, Claude Warren wrote:
>>>>>> I have written an RDFStream to RDFConnection with caching.  Basically,
>>>> the
>>>>>> stream caches triples/quads until a limit is reached and then it writes
>>>>>> them to the RDFConnection.  At finish it writes any triples/quads in
>>>> the
>>>>>> cache to the RDFConnection.
>>>>>> Internally I cache the stream in a dataset.  I write triples to the
>>>> default
>>>>>> dataset and quads as appropriate.
>>>>>> I have a couple of questions:
>>>>>> 1) In this arrangement what does the "base" tell me? I currently
>>>> ignore it
>>>>>> and want to make sure I havn't missed something.
>>>>> 
>>>>> The parser saw a BASE statement.
>>>>> 
>>>>> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
>>>> concatenated).
>>>>> 
>>>>> Its not necessary because the data stream should have resolved IRIs in
>>>> it so base is used in a stream.
>>>>> 
>>>>>> 2) I capture all the prefix calls in a PrefixMapping that is accessible
>>>>>> from the RDFConnectionStream class.  They are not passed into the
>>>> dataset
>>>>>> in any way.  I didn't see any method to do so and don't really think
>>>> it is
>>>>>> needed.  Does anyone see a problem with this?
>>>>>> 3) Does anyone have a use for this class?  If so I am happy to
>>>> contribute
>>>>>> it, though the next question becomes what module to put it in?
>>>> Perhaps we
>>>>>> should have an extras package for RDFStream implementations?
>>>>>> Claude
>>>> 
>>>> 
>>> 
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>> 



Re: RDFStream to RDFConnection

2019-07-10 Thread Andy Seaborne

How big is it one file?  A module, even under jena-extras seems a tad heavy.

Stepping back from the specifics, thinking this might be one of several:

Is this more of an example of how to to do something? That could be done 
by publishing the source, still with the Apache legal framework.


We have jena-examples, package org/apache/jena/example/ and that gets 
into the release source.


Maybe that's a way without too much ceremony.

Or more a "documentation" via the web-site
Or the cwiki?

Andy

On 09/07/2019 10:43, Claude Warren wrote:

So, the question is should I go ahead and create a library of StreamRDF
implementations in the extras section?  I could see one to do serialization
over Kafka (or other queue implementations)?

On Mon, Jul 8, 2019 at 5:56 PM Claude Warren  wrote:


The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:


This "replay" buffer approach was the direction I first went in for TIM,
until turning to MVCC (speaking of MVCC, that code is probably somewhere,
since we don't squash when we merge). Looking back, one thing that helped
me move on was the potential effect of very large transactions. But in a
controlled situation like Claude's, that problem wouldn't arise.

ajs6f


On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF

additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as

BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
buffer and underlying DatasetGraph behave correctly (find* works and has
the right cardinality of results). Its a bit fiddley to get it all right
but once it works it is a building block that has a lot of re-usability.


I came across this with the SHACL work for a BufferingGraph (with

prefixes) give "abort" of transactions to simple graphs which aren't
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically,

the

stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in

the

cache to the RDFConnection.
Internally I cache the stream in a dataset.  I write triples to the

default

dataset and quads as appropriate.
I have a couple of questions:
1) In this arrangement what does the "base" tell me? I currently

ignore it

and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are

concatenated).


Its not necessary because the data stream should have resolved IRIs in

it so base is used in a stream.



2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the

dataset

in any way.  I didn't see any method to do so and don't really think

it is

needed.  Does anyone see a problem with this?
3) Does anyone have a use for this class?  If so I am happy to

contribute

it, though the next question becomes what module to put it in?

Perhaps we

should have an extras package for RDFStream implementations?
Claude





--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren






Re: RDFStream to RDFConnection

2019-07-09 Thread Claude Warren
In my case one document is 2 million triples.  I set a default batch size
of 1000 (I think -- I don't have the code in front of me) but that is
overrideable as a constructor parameter.  More work to determine what the
proper default batch size is.

Internally I send the triples/quads to a dataset and after the batch size
is reached (or on finish()) send the dataset to the RDFConnection.  It is a
simplistic implementation but one that seems to work for my case.

Claude



On Tue, Jul 9, 2019 at 11:09 AM Andy Seaborne  wrote:

> Claude,
>
> How many triples does processing one XML document produce?  There seem
> to be several ways to get a batching/buffering effect including current
> code. e.g send the StreamRDF to a graph, then send the graph over the
> RDFConnection?
>
> One of the nuisances of HTTP is the need to have payloads that are
> correct for both request and response.  Otherwise streaming direct to
> the Fuseki server would be nice but it needs to allow for request-side
> abort. In fact, if you do a GSP requests and stream the body and the
> request has a parse error it will abort but forcing a parse error
> because the request side found a higher level condition that means it
> wants to stop (e.g. the user presses cancel) is pretty ugly.
>
> For SPARQL 1.2, I've suggested developing websockets protocol so that
> interactions with the server can be more sophisticated but that's a long
> way off yet.
>
>  Andy
>
> On 08/07/2019 17:56, Claude Warren wrote:
> > The case I was trying to solve was reading a largish XML document and
> > converting it to an RDF graph.  After a few iterations I ended up
> writing a
> > custom Sax parser that calls the RDFStream triple/quad methods.  But I
> > wanted a way to update a Fuseki server so RDFConnection seemed like the
> > natural choice.
> >
> > In some recent work for my employer I found that I like the RDFConneciton
> > as the same code can work against a local dataset or a remote one.
> >
> > Claude
> >
> > On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:
> >
> >> This "replay" buffer approach was the direction I first went in for TIM,
> >> until turning to MVCC (speaking of MVCC, that code is probably
> somewhere,
> >> since we don't squash when we merge). Looking back, one thing that
> helped
> >> me move on was the potential effect of very large transactions. But in a
> >> controlled situation like Claude's, that problem wouldn't arise.
> >>
> >> ajs6f
> >>
> >>> On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
> >>>
> >>> Claude,
> >>>
> >>> Good timing!
> >>>
> >>> This is what RDF Delta does and for updates rather than just StreamRDF
> >> additions though its not to an RDFConnection - it's to a patch service.
> >>>
> >>> With hindsight, I wonder if that woudl have been better as
> >> BufferingDatasetGraph - a DSG that keeps changes and makes the view of
> the
> >> buffer and underlying DatasetGraph behave correctly (find* works and has
> >> the right cardinality of results). Its a bit fiddley to get it all right
> >> but once it works it is a building block that has a lot of re-usability.
> >>>
> >>> I came across this with the SHACL work for a BufferingGraph (with
> >> prefixes) give "abort" of transactions to simple graphs which aren't
> >> transactional.
> >>>
> >>> But it occurs in Fuseki with complex dataset set ups like rules.
> >>>
> >>> Andy
> >>>
> >>> On 08/07/2019 11:09, Claude Warren wrote:
> >>>> I have written an RDFStream to RDFConnection with caching.  Basically,
> >> the
> >>>> stream caches triples/quads until a limit is reached and then it
> writes
> >>>> them to the RDFConnection.  At finish it writes any triples/quads in
> the
> >>>> cache to the RDFConnection.
> >>>> Internally I cache the stream in a dataset.  I write triples to the
> >> default
> >>>> dataset and quads as appropriate.
> >>>> I have a couple of questions:
> >>>> 1) In this arrangement what does the "base" tell me? I currently
> ignore
> >> it
> >>>> and want to make sure I havn't missed something.
> >>>
> >>> The parser saw a BASE statement.
> >>>
> >>> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
> >> concatenated).
> >>>
> >>> Its not necessary because the data st

Re: RDFStream to RDFConnection

2019-07-09 Thread Andy Seaborne

Claude,

How many triples does processing one XML document produce?  There seem 
to be several ways to get a batching/buffering effect including current 
code. e.g send the StreamRDF to a graph, then send the graph over the 
RDFConnection?


One of the nuisances of HTTP is the need to have payloads that are 
correct for both request and response.  Otherwise streaming direct to 
the Fuseki server would be nice but it needs to allow for request-side 
abort. In fact, if you do a GSP requests and stream the body and the 
request has a parse error it will abort but forcing a parse error 
because the request side found a higher level condition that means it 
wants to stop (e.g. the user presses cancel) is pretty ugly.


For SPARQL 1.2, I've suggested developing websockets protocol so that 
interactions with the server can be more sophisticated but that's a long 
way off yet.


Andy

On 08/07/2019 17:56, Claude Warren wrote:

The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:


This "replay" buffer approach was the direction I first went in for TIM,
until turning to MVCC (speaking of MVCC, that code is probably somewhere,
since we don't squash when we merge). Looking back, one thing that helped
me move on was the potential effect of very large transactions. But in a
controlled situation like Claude's, that problem wouldn't arise.

ajs6f


On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF

additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as

BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
buffer and underlying DatasetGraph behave correctly (find* works and has
the right cardinality of results). Its a bit fiddley to get it all right
but once it works it is a building block that has a lot of re-usability.


I came across this with the SHACL work for a BufferingGraph (with

prefixes) give "abort" of transactions to simple graphs which aren't
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically,

the

stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.
Internally I cache the stream in a dataset.  I write triples to the

default

dataset and quads as appropriate.
I have a couple of questions:
1) In this arrangement what does the "base" tell me? I currently ignore

it

and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are

concatenated).


Its not necessary because the data stream should have resolved IRIs in

it so base is used in a stream.



2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the

dataset

in any way.  I didn't see any method to do so and don't really think it

is

needed.  Does anyone see a problem with this?
3) Does anyone have a use for this class?  If so I am happy to

contribute

it, though the next question becomes what module to put it in?  Perhaps

we

should have an extras package for RDFStream implementations?
Claude







Re: RDFStream to RDFConnection

2019-07-09 Thread Claude Warren
So, the question is should I go ahead and create a library of StreamRDF
implementations in the extras section?  I could see one to do serialization
over Kafka (or other queue implementations)?

On Mon, Jul 8, 2019 at 5:56 PM Claude Warren  wrote:

> The case I was trying to solve was reading a largish XML document and
> converting it to an RDF graph.  After a few iterations I ended up writing a
> custom Sax parser that calls the RDFStream triple/quad methods.  But I
> wanted a way to update a Fuseki server so RDFConnection seemed like the
> natural choice.
>
> In some recent work for my employer I found that I like the RDFConneciton
> as the same code can work against a local dataset or a remote one.
>
> Claude
>
> On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:
>
>> This "replay" buffer approach was the direction I first went in for TIM,
>> until turning to MVCC (speaking of MVCC, that code is probably somewhere,
>> since we don't squash when we merge). Looking back, one thing that helped
>> me move on was the potential effect of very large transactions. But in a
>> controlled situation like Claude's, that problem wouldn't arise.
>>
>> ajs6f
>>
>> > On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
>> >
>> > Claude,
>> >
>> > Good timing!
>> >
>> > This is what RDF Delta does and for updates rather than just StreamRDF
>> additions though its not to an RDFConnection - it's to a patch service.
>> >
>> > With hindsight, I wonder if that woudl have been better as
>> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
>> buffer and underlying DatasetGraph behave correctly (find* works and has
>> the right cardinality of results). Its a bit fiddley to get it all right
>> but once it works it is a building block that has a lot of re-usability.
>> >
>> > I came across this with the SHACL work for a BufferingGraph (with
>> prefixes) give "abort" of transactions to simple graphs which aren't
>> transactional.
>> >
>> > But it occurs in Fuseki with complex dataset set ups like rules.
>> >
>> >Andy
>> >
>> > On 08/07/2019 11:09, Claude Warren wrote:
>> >> I have written an RDFStream to RDFConnection with caching.  Basically,
>> the
>> >> stream caches triples/quads until a limit is reached and then it writes
>> >> them to the RDFConnection.  At finish it writes any triples/quads in
>> the
>> >> cache to the RDFConnection.
>> >> Internally I cache the stream in a dataset.  I write triples to the
>> default
>> >> dataset and quads as appropriate.
>> >> I have a couple of questions:
>> >> 1) In this arrangement what does the "base" tell me? I currently
>> ignore it
>> >> and want to make sure I havn't missed something.
>> >
>> > The parser saw a BASE statement.
>> >
>> > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
>> concatenated).
>> >
>> > Its not necessary because the data stream should have resolved IRIs in
>> it so base is used in a stream.
>> >
>> >> 2) I capture all the prefix calls in a PrefixMapping that is accessible
>> >> from the RDFConnectionStream class.  They are not passed into the
>> dataset
>> >> in any way.  I didn't see any method to do so and don't really think
>> it is
>> >> needed.  Does anyone see a problem with this?
>> >> 3) Does anyone have a use for this class?  If so I am happy to
>> contribute
>> >> it, though the next question becomes what module to put it in?
>> Perhaps we
>> >> should have an extras package for RDFStream implementations?
>> >> Claude
>>
>>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren


Re: RDFStream to RDFConnection

2019-07-08 Thread Claude Warren
The case I was trying to solve was reading a largish XML document and
converting it to an RDF graph.  After a few iterations I ended up writing a
custom Sax parser that calls the RDFStream triple/quad methods.  But I
wanted a way to update a Fuseki server so RDFConnection seemed like the
natural choice.

In some recent work for my employer I found that I like the RDFConneciton
as the same code can work against a local dataset or a remote one.

Claude

On Mon, Jul 8, 2019 at 4:34 PM ajs6f  wrote:

> This "replay" buffer approach was the direction I first went in for TIM,
> until turning to MVCC (speaking of MVCC, that code is probably somewhere,
> since we don't squash when we merge). Looking back, one thing that helped
> me move on was the potential effect of very large transactions. But in a
> controlled situation like Claude's, that problem wouldn't arise.
>
> ajs6f
>
> > On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
> >
> > Claude,
> >
> > Good timing!
> >
> > This is what RDF Delta does and for updates rather than just StreamRDF
> additions though its not to an RDFConnection - it's to a patch service.
> >
> > With hindsight, I wonder if that woudl have been better as
> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the
> buffer and underlying DatasetGraph behave correctly (find* works and has
> the right cardinality of results). Its a bit fiddley to get it all right
> but once it works it is a building block that has a lot of re-usability.
> >
> > I came across this with the SHACL work for a BufferingGraph (with
> prefixes) give "abort" of transactions to simple graphs which aren't
> transactional.
> >
> > But it occurs in Fuseki with complex dataset set ups like rules.
> >
> >Andy
> >
> > On 08/07/2019 11:09, Claude Warren wrote:
> >> I have written an RDFStream to RDFConnection with caching.  Basically,
> the
> >> stream caches triples/quads until a limit is reached and then it writes
> >> them to the RDFConnection.  At finish it writes any triples/quads in the
> >> cache to the RDFConnection.
> >> Internally I cache the stream in a dataset.  I write triples to the
> default
> >> dataset and quads as appropriate.
> >> I have a couple of questions:
> >> 1) In this arrangement what does the "base" tell me? I currently ignore
> it
> >> and want to make sure I havn't missed something.
> >
> > The parser saw a BASE statement.
> >
> > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are
> concatenated).
> >
> > Its not necessary because the data stream should have resolved IRIs in
> it so base is used in a stream.
> >
> >> 2) I capture all the prefix calls in a PrefixMapping that is accessible
> >> from the RDFConnectionStream class.  They are not passed into the
> dataset
> >> in any way.  I didn't see any method to do so and don't really think it
> is
> >> needed.  Does anyone see a problem with this?
> >> 3) Does anyone have a use for this class?  If so I am happy to
> contribute
> >> it, though the next question becomes what module to put it in?  Perhaps
> we
> >> should have an extras package for RDFStream implementations?
> >> Claude
>
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren


Re: RDFStream to RDFConnection

2019-07-08 Thread ajs6f
This "replay" buffer approach was the direction I first went in for TIM, until 
turning to MVCC (speaking of MVCC, that code is probably somewhere, since we 
don't squash when we merge). Looking back, one thing that helped me move on was 
the potential effect of very large transactions. But in a controlled situation 
like Claude's, that problem wouldn't arise.

ajs6f

> On Jul 8, 2019, at 11:07 AM, Andy Seaborne  wrote:
> 
> Claude,
> 
> Good timing!
> 
> This is what RDF Delta does and for updates rather than just StreamRDF 
> additions though its not to an RDFConnection - it's to a patch service.
> 
> With hindsight, I wonder if that woudl have been better as 
> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the 
> buffer and underlying DatasetGraph behave correctly (find* works and has the 
> right cardinality of results). Its a bit fiddley to get it all right but once 
> it works it is a building block that has a lot of re-usability.
> 
> I came across this with the SHACL work for a BufferingGraph (with prefixes) 
> give "abort" of transactions to simple graphs which aren't transactional.
> 
> But it occurs in Fuseki with complex dataset set ups like rules.
> 
>    Andy
> 
> On 08/07/2019 11:09, Claude Warren wrote:
>> I have written an RDFStream to RDFConnection with caching.  Basically, the
>> stream caches triples/quads until a limit is reached and then it writes
>> them to the RDFConnection.  At finish it writes any triples/quads in the
>> cache to the RDFConnection.
>> Internally I cache the stream in a dataset.  I write triples to the default
>> dataset and quads as appropriate.
>> I have a couple of questions:
>> 1) In this arrangement what does the "base" tell me? I currently ignore it
>> and want to make sure I havn't missed something.
> 
> The parser saw a BASE statement.
> 
> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are 
> concatenated).
> 
> Its not necessary because the data stream should have resolved IRIs in it so 
> base is used in a stream.
> 
>> 2) I capture all the prefix calls in a PrefixMapping that is accessible
>> from the RDFConnectionStream class.  They are not passed into the dataset
>> in any way.  I didn't see any method to do so and don't really think it is
>> needed.  Does anyone see a problem with this?
>> 3) Does anyone have a use for this class?  If so I am happy to contribute
>> it, though the next question becomes what module to put it in?  Perhaps we
>> should have an extras package for RDFStream implementations?
>> Claude



Re: RDFStream to RDFConnection

2019-07-08 Thread Andy Seaborne

Claude,

Good timing!

This is what RDF Delta does and for updates rather than just StreamRDF 
additions though its not to an RDFConnection - it's to a patch service.


With hindsight, I wonder if that woudl have been better as 
BufferingDatasetGraph - a DSG that keeps changes and makes the view of 
the buffer and underlying DatasetGraph behave correctly (find* works and 
has the right cardinality of results). Its a bit fiddley to get it all 
right but once it works it is a building block that has a lot of 
re-usability.


I came across this with the SHACL work for a BufferingGraph (with 
prefixes) give "abort" of transactions to simple graphs which aren't 
transactional.


But it occurs in Fuseki with complex dataset set ups like rules.

Andy

On 08/07/2019 11:09, Claude Warren wrote:

I have written an RDFStream to RDFConnection with caching.  Basically, the
stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.

Internally I cache the stream in a dataset.  I write triples to the default
dataset and quads as appropriate.

I have a couple of questions:

1) In this arrangement what does the "base" tell me? I currently ignore it
and want to make sure I havn't missed something.


The parser saw a BASE statement.

Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are 
concatenated).


Its not necessary because the data stream should have resolved IRIs in 
it so base is used in a stream.



2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the dataset
in any way.  I didn't see any method to do so and don't really think it is
needed.  Does anyone see a problem with this?

3) Does anyone have a use for this class?  If so I am happy to contribute
it, though the next question becomes what module to put it in?  Perhaps we
should have an extras package for RDFStream implementations?

Claude



RDFStream to RDFConnection

2019-07-08 Thread Claude Warren
I have written an RDFStream to RDFConnection with caching.  Basically, the
stream caches triples/quads until a limit is reached and then it writes
them to the RDFConnection.  At finish it writes any triples/quads in the
cache to the RDFConnection.

Internally I cache the stream in a dataset.  I write triples to the default
dataset and quads as appropriate.

I have a couple of questions:

1) In this arrangement what does the "base" tell me? I currently ignore it
and want to make sure I havn't missed something.

2) I capture all the prefix calls in a PrefixMapping that is accessible
from the RDFConnectionStream class.  They are not passed into the dataset
in any way.  I didn't see any method to do so and don't really think it is
needed.  Does anyone see a problem with this?

3) Does anyone have a use for this class?  If so I am happy to contribute
it, though the next question becomes what module to put it in?  Perhaps we
should have an extras package for RDFStream implementations?

Claude

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren