[ https://issues.apache.org/jira/browse/JENA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Claus Stadler updated JENA-1502: -------------------------------- Summary: SPARQL extensions for processing CSV, XML, JSON and remote data (was: SPARQL extensions for processing CSV, XML and JSON) > SPARQL extensions for processing CSV, XML, JSON and remote data > --------------------------------------------------------------- > > Key: JENA-1502 > URL: https://issues.apache.org/jira/browse/JENA-1502 > Project: Apache Jena > Issue Type: Improvement > Components: ARQ > Affects Versions: Jena 3.6.0 > Reporter: Claus Stadler > Priority: Major > > Many systems have been built so far for transforming heterogeneous data - > most prominently CSV, XML and JSON) to RDF. > As it turns out, with a few extensions to ARQ, Jena becomes (at least for me) > an extremely convenient tool for this task. > To clarify our point, for a project we have to convert several (open) > datasets, and we came up with a solution where we just have to execute a > sequence of SPARQL queries making use of our ARQ extensions. > In [this > repository|https://github.com/QROWD/QROWD-RDF-Data-Integration/tree/master/datasets/1046-1051] > there are sub folders with JSON datasets, and the conversion is just a > matter of running the SPARQL queries in the files > [workloads.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/workloads.sparql] > (which adds triples describing workloads into a jena in-memory dataset) and > [process.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/process.sparql] > (which processes all workloads in that dataset and inserts triples into a > (named) result graph). We created a [thin command line > wrapper|https://github.com/SmartDataAnalytics/Sparqlintegrate] to > conveniently run these conversions. > An example of these extension functions: > {code:sql} > # Add labels of train / bus stops > INSERT { > GRAPH eg:result { ?s rdfs:label ?l } > } > WHERE { > ?x eg:workload ?o > BIND(json:path(?o, "$.stopNames") AS ?stopNames) > ?stopNames json:unnest (?l ?i) . > GRAPH ?x { ?s eg:stopId ?i } > } > {code} > In fact, these SPARQL ARQ extensions would enable any Jena-based project to > perform such integration tasks - and for instance one could already start a > Fuseki in order to play around with conversions in a Web interface. > * Is there interest to integrate our ARQ [SPARQL extension > functions|https://github.com/SmartDataAnalytics/jena-sparql-api/tree/develop/jena-sparql-api-sparql-ext] > into Jena? If so, what would we have to do and where (which existing or new > jena module) would be the most appropriate place? > We are also open to discussion and changes on what exactly the signatures of > these extension functions should look like. For instance, right now we use > two custom datatypes, xsd:json and xsd:xml which obviously should be replaced > by better IRIs. > * Maybe the functionality of running files containing sequences of SPARQL > queries from the command line could also be added to Jena directly - as I > think there is no magic outside the scope of Jena to it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)