Re: RDFStream to RDFConnection
+1 to a spot in jena-examples with a write-up on our site. ajs6f > On Jul 10, 2019, at 3:20 PM, Andy Seaborne wrote: > > How big is it one file? A module, even under jena-extras seems a tad heavy. > > Stepping back from the specifics, thinking this might be one of several: > > Is this more of an example of how to to do something? That could be done by > publishing the source, still with the Apache legal framework. > > We have jena-examples, package org/apache/jena/example/ and that gets into > the release source. > > Maybe that's a way without too much ceremony. > > Or more a "documentation" via the web-site > Or the cwiki? > >Andy > > On 09/07/2019 10:43, Claude Warren wrote: >> So, the question is should I go ahead and create a library of StreamRDF >> implementations in the extras section? I could see one to do serialization >> over Kafka (or other queue implementations)? >> On Mon, Jul 8, 2019 at 5:56 PM Claude Warren wrote: >>> The case I was trying to solve was reading a largish XML document and >>> converting it to an RDF graph. After a few iterations I ended up writing a >>> custom Sax parser that calls the RDFStream triple/quad methods. But I >>> wanted a way to update a Fuseki server so RDFConnection seemed like the >>> natural choice. >>> >>> In some recent work for my employer I found that I like the RDFConneciton >>> as the same code can work against a local dataset or a remote one. >>> >>> Claude >>> >>> On Mon, Jul 8, 2019 at 4:34 PM ajs6f wrote: >>> >>>> This "replay" buffer approach was the direction I first went in for TIM, >>>> until turning to MVCC (speaking of MVCC, that code is probably somewhere, >>>> since we don't squash when we merge). Looking back, one thing that helped >>>> me move on was the potential effect of very large transactions. But in a >>>> controlled situation like Claude's, that problem wouldn't arise. >>>> >>>> ajs6f >>>> >>>>> On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: >>>>> >>>>> Claude, >>>>> >>>>> Good timing! >>>>> >>>>> This is what RDF Delta does and for updates rather than just StreamRDF >>>> additions though its not to an RDFConnection - it's to a patch service. >>>>> >>>>> With hindsight, I wonder if that woudl have been better as >>>> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the >>>> buffer and underlying DatasetGraph behave correctly (find* works and has >>>> the right cardinality of results). Its a bit fiddley to get it all right >>>> but once it works it is a building block that has a lot of re-usability. >>>>> >>>>> I came across this with the SHACL work for a BufferingGraph (with >>>> prefixes) give "abort" of transactions to simple graphs which aren't >>>> transactional. >>>>> >>>>> But it occurs in Fuseki with complex dataset set ups like rules. >>>>> >>>>>Andy >>>>> >>>>> On 08/07/2019 11:09, Claude Warren wrote: >>>>>> I have written an RDFStream to RDFConnection with caching. Basically, >>>> the >>>>>> stream caches triples/quads until a limit is reached and then it writes >>>>>> them to the RDFConnection. At finish it writes any triples/quads in >>>> the >>>>>> cache to the RDFConnection. >>>>>> Internally I cache the stream in a dataset. I write triples to the >>>> default >>>>>> dataset and quads as appropriate. >>>>>> I have a couple of questions: >>>>>> 1) In this arrangement what does the "base" tell me? I currently >>>> ignore it >>>>>> and want to make sure I havn't missed something. >>>>> >>>>> The parser saw a BASE statement. >>>>> >>>>> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are >>>> concatenated). >>>>> >>>>> Its not necessary because the data stream should have resolved IRIs in >>>> it so base is used in a stream. >>>>> >>>>>> 2) I capture all the prefix calls in a PrefixMapping that is accessible >>>>>> from the RDFConnectionStream class. They are not passed into the >>>> dataset >>>>>> in any way. I didn't see any method to do so and don't really think >>>> it is >>>>>> needed. Does anyone see a problem with this? >>>>>> 3) Does anyone have a use for this class? If so I am happy to >>>> contribute >>>>>> it, though the next question becomes what module to put it in? >>>> Perhaps we >>>>>> should have an extras package for RDFStream implementations? >>>>>> Claude >>>> >>>> >>> >>> -- >>> I like: Like Like - The likeliest place on the web >>> <http://like-like.xenei.com> >>> LinkedIn: http://www.linkedin.com/in/claudewarren >>>
Re: RDFStream to RDFConnection
How big is it one file? A module, even under jena-extras seems a tad heavy. Stepping back from the specifics, thinking this might be one of several: Is this more of an example of how to to do something? That could be done by publishing the source, still with the Apache legal framework. We have jena-examples, package org/apache/jena/example/ and that gets into the release source. Maybe that's a way without too much ceremony. Or more a "documentation" via the web-site Or the cwiki? Andy On 09/07/2019 10:43, Claude Warren wrote: So, the question is should I go ahead and create a library of StreamRDF implementations in the extras section? I could see one to do serialization over Kafka (or other queue implementations)? On Mon, Jul 8, 2019 at 5:56 PM Claude Warren wrote: The case I was trying to solve was reading a largish XML document and converting it to an RDF graph. After a few iterations I ended up writing a custom Sax parser that calls the RDFStream triple/quad methods. But I wanted a way to update a Fuseki server so RDFConnection seemed like the natural choice. In some recent work for my employer I found that I like the RDFConneciton as the same code can work against a local dataset or a remote one. Claude On Mon, Jul 8, 2019 at 4:34 PM ajs6f wrote: This "replay" buffer approach was the direction I first went in for TIM, until turning to MVCC (speaking of MVCC, that code is probably somewhere, since we don't squash when we merge). Looking back, one thing that helped me move on was the potential effect of very large transactions. But in a controlled situation like Claude's, that problem wouldn't arise. ajs6f On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: Claude, Good timing! This is what RDF Delta does and for updates rather than just StreamRDF additions though its not to an RDFConnection - it's to a patch service. With hindsight, I wonder if that woudl have been better as BufferingDatasetGraph - a DSG that keeps changes and makes the view of the buffer and underlying DatasetGraph behave correctly (find* works and has the right cardinality of results). Its a bit fiddley to get it all right but once it works it is a building block that has a lot of re-usability. I came across this with the SHACL work for a BufferingGraph (with prefixes) give "abort" of transactions to simple graphs which aren't transactional. But it occurs in Fuseki with complex dataset set ups like rules. Andy On 08/07/2019 11:09, Claude Warren wrote: I have written an RDFStream to RDFConnection with caching. Basically, the stream caches triples/quads until a limit is reached and then it writes them to the RDFConnection. At finish it writes any triples/quads in the cache to the RDFConnection. Internally I cache the stream in a dataset. I write triples to the default dataset and quads as appropriate. I have a couple of questions: 1) In this arrangement what does the "base" tell me? I currently ignore it and want to make sure I havn't missed something. The parser saw a BASE statement. Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are concatenated). Its not necessary because the data stream should have resolved IRIs in it so base is used in a stream. 2) I capture all the prefix calls in a PrefixMapping that is accessible from the RDFConnectionStream class. They are not passed into the dataset in any way. I didn't see any method to do so and don't really think it is needed. Does anyone see a problem with this? 3) Does anyone have a use for this class? If so I am happy to contribute it, though the next question becomes what module to put it in? Perhaps we should have an extras package for RDFStream implementations? Claude -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
Re: RDFStream to RDFConnection
In my case one document is 2 million triples. I set a default batch size of 1000 (I think -- I don't have the code in front of me) but that is overrideable as a constructor parameter. More work to determine what the proper default batch size is. Internally I send the triples/quads to a dataset and after the batch size is reached (or on finish()) send the dataset to the RDFConnection. It is a simplistic implementation but one that seems to work for my case. Claude On Tue, Jul 9, 2019 at 11:09 AM Andy Seaborne wrote: > Claude, > > How many triples does processing one XML document produce? There seem > to be several ways to get a batching/buffering effect including current > code. e.g send the StreamRDF to a graph, then send the graph over the > RDFConnection? > > One of the nuisances of HTTP is the need to have payloads that are > correct for both request and response. Otherwise streaming direct to > the Fuseki server would be nice but it needs to allow for request-side > abort. In fact, if you do a GSP requests and stream the body and the > request has a parse error it will abort but forcing a parse error > because the request side found a higher level condition that means it > wants to stop (e.g. the user presses cancel) is pretty ugly. > > For SPARQL 1.2, I've suggested developing websockets protocol so that > interactions with the server can be more sophisticated but that's a long > way off yet. > > Andy > > On 08/07/2019 17:56, Claude Warren wrote: > > The case I was trying to solve was reading a largish XML document and > > converting it to an RDF graph. After a few iterations I ended up > writing a > > custom Sax parser that calls the RDFStream triple/quad methods. But I > > wanted a way to update a Fuseki server so RDFConnection seemed like the > > natural choice. > > > > In some recent work for my employer I found that I like the RDFConneciton > > as the same code can work against a local dataset or a remote one. > > > > Claude > > > > On Mon, Jul 8, 2019 at 4:34 PM ajs6f wrote: > > > >> This "replay" buffer approach was the direction I first went in for TIM, > >> until turning to MVCC (speaking of MVCC, that code is probably > somewhere, > >> since we don't squash when we merge). Looking back, one thing that > helped > >> me move on was the potential effect of very large transactions. But in a > >> controlled situation like Claude's, that problem wouldn't arise. > >> > >> ajs6f > >> > >>> On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: > >>> > >>> Claude, > >>> > >>> Good timing! > >>> > >>> This is what RDF Delta does and for updates rather than just StreamRDF > >> additions though its not to an RDFConnection - it's to a patch service. > >>> > >>> With hindsight, I wonder if that woudl have been better as > >> BufferingDatasetGraph - a DSG that keeps changes and makes the view of > the > >> buffer and underlying DatasetGraph behave correctly (find* works and has > >> the right cardinality of results). Its a bit fiddley to get it all right > >> but once it works it is a building block that has a lot of re-usability. > >>> > >>> I came across this with the SHACL work for a BufferingGraph (with > >> prefixes) give "abort" of transactions to simple graphs which aren't > >> transactional. > >>> > >>> But it occurs in Fuseki with complex dataset set ups like rules. > >>> > >>> Andy > >>> > >>> On 08/07/2019 11:09, Claude Warren wrote: > >>>> I have written an RDFStream to RDFConnection with caching. Basically, > >> the > >>>> stream caches triples/quads until a limit is reached and then it > writes > >>>> them to the RDFConnection. At finish it writes any triples/quads in > the > >>>> cache to the RDFConnection. > >>>> Internally I cache the stream in a dataset. I write triples to the > >> default > >>>> dataset and quads as appropriate. > >>>> I have a couple of questions: > >>>> 1) In this arrangement what does the "base" tell me? I currently > ignore > >> it > >>>> and want to make sure I havn't missed something. > >>> > >>> The parser saw a BASE statement. > >>> > >>> Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are > >> concatenated). > >>> > >>> Its not necessary because the data st
Re: RDFStream to RDFConnection
Claude, How many triples does processing one XML document produce? There seem to be several ways to get a batching/buffering effect including current code. e.g send the StreamRDF to a graph, then send the graph over the RDFConnection? One of the nuisances of HTTP is the need to have payloads that are correct for both request and response. Otherwise streaming direct to the Fuseki server would be nice but it needs to allow for request-side abort. In fact, if you do a GSP requests and stream the body and the request has a parse error it will abort but forcing a parse error because the request side found a higher level condition that means it wants to stop (e.g. the user presses cancel) is pretty ugly. For SPARQL 1.2, I've suggested developing websockets protocol so that interactions with the server can be more sophisticated but that's a long way off yet. Andy On 08/07/2019 17:56, Claude Warren wrote: The case I was trying to solve was reading a largish XML document and converting it to an RDF graph. After a few iterations I ended up writing a custom Sax parser that calls the RDFStream triple/quad methods. But I wanted a way to update a Fuseki server so RDFConnection seemed like the natural choice. In some recent work for my employer I found that I like the RDFConneciton as the same code can work against a local dataset or a remote one. Claude On Mon, Jul 8, 2019 at 4:34 PM ajs6f wrote: This "replay" buffer approach was the direction I first went in for TIM, until turning to MVCC (speaking of MVCC, that code is probably somewhere, since we don't squash when we merge). Looking back, one thing that helped me move on was the potential effect of very large transactions. But in a controlled situation like Claude's, that problem wouldn't arise. ajs6f On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: Claude, Good timing! This is what RDF Delta does and for updates rather than just StreamRDF additions though its not to an RDFConnection - it's to a patch service. With hindsight, I wonder if that woudl have been better as BufferingDatasetGraph - a DSG that keeps changes and makes the view of the buffer and underlying DatasetGraph behave correctly (find* works and has the right cardinality of results). Its a bit fiddley to get it all right but once it works it is a building block that has a lot of re-usability. I came across this with the SHACL work for a BufferingGraph (with prefixes) give "abort" of transactions to simple graphs which aren't transactional. But it occurs in Fuseki with complex dataset set ups like rules. Andy On 08/07/2019 11:09, Claude Warren wrote: I have written an RDFStream to RDFConnection with caching. Basically, the stream caches triples/quads until a limit is reached and then it writes them to the RDFConnection. At finish it writes any triples/quads in the cache to the RDFConnection. Internally I cache the stream in a dataset. I write triples to the default dataset and quads as appropriate. I have a couple of questions: 1) In this arrangement what does the "base" tell me? I currently ignore it and want to make sure I havn't missed something. The parser saw a BASE statement. Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are concatenated). Its not necessary because the data stream should have resolved IRIs in it so base is used in a stream. 2) I capture all the prefix calls in a PrefixMapping that is accessible from the RDFConnectionStream class. They are not passed into the dataset in any way. I didn't see any method to do so and don't really think it is needed. Does anyone see a problem with this? 3) Does anyone have a use for this class? If so I am happy to contribute it, though the next question becomes what module to put it in? Perhaps we should have an extras package for RDFStream implementations? Claude
Re: RDFStream to RDFConnection
So, the question is should I go ahead and create a library of StreamRDF implementations in the extras section? I could see one to do serialization over Kafka (or other queue implementations)? On Mon, Jul 8, 2019 at 5:56 PM Claude Warren wrote: > The case I was trying to solve was reading a largish XML document and > converting it to an RDF graph. After a few iterations I ended up writing a > custom Sax parser that calls the RDFStream triple/quad methods. But I > wanted a way to update a Fuseki server so RDFConnection seemed like the > natural choice. > > In some recent work for my employer I found that I like the RDFConneciton > as the same code can work against a local dataset or a remote one. > > Claude > > On Mon, Jul 8, 2019 at 4:34 PM ajs6f wrote: > >> This "replay" buffer approach was the direction I first went in for TIM, >> until turning to MVCC (speaking of MVCC, that code is probably somewhere, >> since we don't squash when we merge). Looking back, one thing that helped >> me move on was the potential effect of very large transactions. But in a >> controlled situation like Claude's, that problem wouldn't arise. >> >> ajs6f >> >> > On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: >> > >> > Claude, >> > >> > Good timing! >> > >> > This is what RDF Delta does and for updates rather than just StreamRDF >> additions though its not to an RDFConnection - it's to a patch service. >> > >> > With hindsight, I wonder if that woudl have been better as >> BufferingDatasetGraph - a DSG that keeps changes and makes the view of the >> buffer and underlying DatasetGraph behave correctly (find* works and has >> the right cardinality of results). Its a bit fiddley to get it all right >> but once it works it is a building block that has a lot of re-usability. >> > >> > I came across this with the SHACL work for a BufferingGraph (with >> prefixes) give "abort" of transactions to simple graphs which aren't >> transactional. >> > >> > But it occurs in Fuseki with complex dataset set ups like rules. >> > >> >Andy >> > >> > On 08/07/2019 11:09, Claude Warren wrote: >> >> I have written an RDFStream to RDFConnection with caching. Basically, >> the >> >> stream caches triples/quads until a limit is reached and then it writes >> >> them to the RDFConnection. At finish it writes any triples/quads in >> the >> >> cache to the RDFConnection. >> >> Internally I cache the stream in a dataset. I write triples to the >> default >> >> dataset and quads as appropriate. >> >> I have a couple of questions: >> >> 1) In this arrangement what does the "base" tell me? I currently >> ignore it >> >> and want to make sure I havn't missed something. >> > >> > The parser saw a BASE statement. >> > >> > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are >> concatenated). >> > >> > Its not necessary because the data stream should have resolved IRIs in >> it so base is used in a stream. >> > >> >> 2) I capture all the prefix calls in a PrefixMapping that is accessible >> >> from the RDFConnectionStream class. They are not passed into the >> dataset >> >> in any way. I didn't see any method to do so and don't really think >> it is >> >> needed. Does anyone see a problem with this? >> >> 3) Does anyone have a use for this class? If so I am happy to >> contribute >> >> it, though the next question becomes what module to put it in? >> Perhaps we >> >> should have an extras package for RDFStream implementations? >> >> Claude >> >> > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
Re: RDFStream to RDFConnection
The case I was trying to solve was reading a largish XML document and converting it to an RDF graph. After a few iterations I ended up writing a custom Sax parser that calls the RDFStream triple/quad methods. But I wanted a way to update a Fuseki server so RDFConnection seemed like the natural choice. In some recent work for my employer I found that I like the RDFConneciton as the same code can work against a local dataset or a remote one. Claude On Mon, Jul 8, 2019 at 4:34 PM ajs6f wrote: > This "replay" buffer approach was the direction I first went in for TIM, > until turning to MVCC (speaking of MVCC, that code is probably somewhere, > since we don't squash when we merge). Looking back, one thing that helped > me move on was the potential effect of very large transactions. But in a > controlled situation like Claude's, that problem wouldn't arise. > > ajs6f > > > On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: > > > > Claude, > > > > Good timing! > > > > This is what RDF Delta does and for updates rather than just StreamRDF > additions though its not to an RDFConnection - it's to a patch service. > > > > With hindsight, I wonder if that woudl have been better as > BufferingDatasetGraph - a DSG that keeps changes and makes the view of the > buffer and underlying DatasetGraph behave correctly (find* works and has > the right cardinality of results). Its a bit fiddley to get it all right > but once it works it is a building block that has a lot of re-usability. > > > > I came across this with the SHACL work for a BufferingGraph (with > prefixes) give "abort" of transactions to simple graphs which aren't > transactional. > > > > But it occurs in Fuseki with complex dataset set ups like rules. > > > >Andy > > > > On 08/07/2019 11:09, Claude Warren wrote: > >> I have written an RDFStream to RDFConnection with caching. Basically, > the > >> stream caches triples/quads until a limit is reached and then it writes > >> them to the RDFConnection. At finish it writes any triples/quads in the > >> cache to the RDFConnection. > >> Internally I cache the stream in a dataset. I write triples to the > default > >> dataset and quads as appropriate. > >> I have a couple of questions: > >> 1) In this arrangement what does the "base" tell me? I currently ignore > it > >> and want to make sure I havn't missed something. > > > > The parser saw a BASE statement. > > > > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are > concatenated). > > > > Its not necessary because the data stream should have resolved IRIs in > it so base is used in a stream. > > > >> 2) I capture all the prefix calls in a PrefixMapping that is accessible > >> from the RDFConnectionStream class. They are not passed into the > dataset > >> in any way. I didn't see any method to do so and don't really think it > is > >> needed. Does anyone see a problem with this? > >> 3) Does anyone have a use for this class? If so I am happy to > contribute > >> it, though the next question becomes what module to put it in? Perhaps > we > >> should have an extras package for RDFStream implementations? > >> Claude > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
Re: RDFStream to RDFConnection
This "replay" buffer approach was the direction I first went in for TIM, until turning to MVCC (speaking of MVCC, that code is probably somewhere, since we don't squash when we merge). Looking back, one thing that helped me move on was the potential effect of very large transactions. But in a controlled situation like Claude's, that problem wouldn't arise. ajs6f > On Jul 8, 2019, at 11:07 AM, Andy Seaborne wrote: > > Claude, > > Good timing! > > This is what RDF Delta does and for updates rather than just StreamRDF > additions though its not to an RDFConnection - it's to a patch service. > > With hindsight, I wonder if that woudl have been better as > BufferingDatasetGraph - a DSG that keeps changes and makes the view of the > buffer and underlying DatasetGraph behave correctly (find* works and has the > right cardinality of results). Its a bit fiddley to get it all right but once > it works it is a building block that has a lot of re-usability. > > I came across this with the SHACL work for a BufferingGraph (with prefixes) > give "abort" of transactions to simple graphs which aren't transactional. > > But it occurs in Fuseki with complex dataset set ups like rules. > > Andy > > On 08/07/2019 11:09, Claude Warren wrote: >> I have written an RDFStream to RDFConnection with caching. Basically, the >> stream caches triples/quads until a limit is reached and then it writes >> them to the RDFConnection. At finish it writes any triples/quads in the >> cache to the RDFConnection. >> Internally I cache the stream in a dataset. I write triples to the default >> dataset and quads as appropriate. >> I have a couple of questions: >> 1) In this arrangement what does the "base" tell me? I currently ignore it >> and want to make sure I havn't missed something. > > The parser saw a BASE statement. > > Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are > concatenated). > > Its not necessary because the data stream should have resolved IRIs in it so > base is used in a stream. > >> 2) I capture all the prefix calls in a PrefixMapping that is accessible >> from the RDFConnectionStream class. They are not passed into the dataset >> in any way. I didn't see any method to do so and don't really think it is >> needed. Does anyone see a problem with this? >> 3) Does anyone have a use for this class? If so I am happy to contribute >> it, though the next question becomes what module to put it in? Perhaps we >> should have an extras package for RDFStream implementations? >> Claude
Re: RDFStream to RDFConnection
Claude, Good timing! This is what RDF Delta does and for updates rather than just StreamRDF additions though its not to an RDFConnection - it's to a patch service. With hindsight, I wonder if that woudl have been better as BufferingDatasetGraph - a DSG that keeps changes and makes the view of the buffer and underlying DatasetGraph behave correctly (find* works and has the right cardinality of results). Its a bit fiddley to get it all right but once it works it is a building block that has a lot of re-usability. I came across this with the SHACL work for a BufferingGraph (with prefixes) give "abort" of transactions to simple graphs which aren't transactional. But it occurs in Fuseki with complex dataset set ups like rules. Andy On 08/07/2019 11:09, Claude Warren wrote: I have written an RDFStream to RDFConnection with caching. Basically, the stream caches triples/quads until a limit is reached and then it writes them to the RDFConnection. At finish it writes any triples/quads in the cache to the RDFConnection. Internally I cache the stream in a dataset. I write triples to the default dataset and quads as appropriate. I have a couple of questions: 1) In this arrangement what does the "base" tell me? I currently ignore it and want to make sure I havn't missed something. The parser saw a BASE statement. Like PREFIX, in Turtle, it can happen mid-file (e.g. when files are concatenated). Its not necessary because the data stream should have resolved IRIs in it so base is used in a stream. 2) I capture all the prefix calls in a PrefixMapping that is accessible from the RDFConnectionStream class. They are not passed into the dataset in any way. I didn't see any method to do so and don't really think it is needed. Does anyone see a problem with this? 3) Does anyone have a use for this class? If so I am happy to contribute it, though the next question becomes what module to put it in? Perhaps we should have an extras package for RDFStream implementations? Claude
RDFStream to RDFConnection
I have written an RDFStream to RDFConnection with caching. Basically, the stream caches triples/quads until a limit is reached and then it writes them to the RDFConnection. At finish it writes any triples/quads in the cache to the RDFConnection. Internally I cache the stream in a dataset. I write triples to the default dataset and quads as appropriate. I have a couple of questions: 1) In this arrangement what does the "base" tell me? I currently ignore it and want to make sure I havn't missed something. 2) I capture all the prefix calls in a PrefixMapping that is accessible from the RDFConnectionStream class. They are not passed into the dataset in any way. I didn't see any method to do so and don't really think it is needed. Does anyone see a problem with this? 3) Does anyone have a use for this class? If so I am happy to contribute it, though the next question becomes what module to put it in? Perhaps we should have an extras package for RDFStream implementations? Claude -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren