Re: Adding a read-only data source to Jena

Andy Seaborne Sat, 17 Sep 2011 11:38:35 -0700

On 16/09/11 21:22, Glenn Ammons wrote:

I have a number of CSV files over which I would like to do SPARQL
queries, without converting them to RDF first.  I'm trying to figure
out how to extend Jena so that each flat file would appear to ARQ
queries as a new named graph.  This page:


http://jena.sourceforge.net/ARQ/arq-query-eval.html

suggests extending GraphBase.java, which is straightforward enough,
but it doesn't explain how to register the new Graph implementation
with the system.  It seems to me that, at a minimum, I would need some
way to inform query execution of the named graphs that my extension
supplies.

Are there any examples of such an extension?  I know about
ARQ-2.8.8/src-examples/arq/examples/engine/MyQueryEngine.java, but I'm
not sure that I need to write my own query engine.  I've also been
looking at the TDB initialization code.

Thanks.
--glenn


Glenn,

You don't need to introduce the graph implementation to the system - TDBdoes some initialization for other reasons.

If you extend GraphBase (one method needed - find(s,p,o)) to deal withCSV mapping then it's done.


Then  (1)

Model model = ModelFactory.modelForGraph(graph) ;

and you can put one model per file in a DataSource.

or (2) you can also skip the Model stuff, build a DatasetGraph(DatasetGraphMap), and wrap it up as a Dataset withDatasetFactory.creat(datasetGraph).

These two get you to the same place from the point-of-view of the SPARQLsystem. They each use a general purpose Dataset(Graph) implementationthat maps down to specific graphs.

This general purpose Dataset implementation is already in ARQ - no needto register specific query engines for this case.

These execute SPARQl Query over the Graph.find operation. Everythingwill work.

Only if you want to have a specialised engine that does something else(specialised indexing maybe, or a *lot* of CSV files that it is worthimplementing a specialised storage system for them) do you need to gofurther.

Even if that's needed, I suggest implementing the GraphBase route firstbecause it is quickest to get something working. The thing you wil haveto add is the translation between RDF and the CSV data model used in thecolumns.

ARQ has the CSV and TSV output that is now a W3C draft, in case thathelps for getting information back out of the RDF view.


http://www.w3.org/TR/sparql11-results-csv-tsv/

TDB implements a specialized DatasetGraph and executes SPARQL queries(the BGP parts) directly over the idnexes, not going back and forththrough to Node objects - that's why it registers itself. If you putTDB backed graphs in a general purpose dataset, then ARQ is going toaccess TDB via the Graph.find route.


        Andy

Re: Adding a read-only data source to Jena

Reply via email to