[jira] [Commented] (JENA-1502) SPARQL extensions for processing CSV, XML, JSON and remote data

Andy Seaborne (JIRA) Wed, 14 Mar 2018 08:25:25 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398745#comment-16398745
 ]


Andy Seaborne commented on JENA-1502:
-------------------------------------

Just on JSON numbers -> RDF:

v3.7.0 will have JS functions and the conversion from JS to RDF for the 
function result, tries to narrow the number to integer if possible:

https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/function/js/NV.java


> SPARQL extensions for processing CSV, XML, JSON and remote data
> ---------------------------------------------------------------
>
>                 Key: JENA-1502
>                 URL: https://issues.apache.org/jira/browse/JENA-1502
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.6.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> Many systems have been built so far for transforming heterogeneous data - 
> most prominently CSV, XML and JSON) to RDF.
> As it turns out, with a few extensions to ARQ, Jena becomes (at least for me) 
> an extremely convenient tool for this task.
> To clarify our point, for a project we have to convert several (open) 
> datasets, and we came up with a solution where we just have to execute a 
> sequence of SPARQL queries making use of our ARQ extensions.
> In [this 
> repository|https://github.com/QROWD/QROWD-RDF-Data-Integration/tree/master/datasets/1046-1051]
>  there are sub folders with JSON datasets, and the conversion is just a 
> matter of running the SPARQL queries in the files 
> [workloads.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/workloads.sparql]
>  (which adds triples describing workloads into a jena in-memory dataset) and 
> [process.sparql|https://github.com/QROWD/QROWD-RDF-Data-Integration/blob/master/datasets/1046-1051/process.sparql]
>  (which processes all workloads in that dataset and inserts triples into a 
> (named) result graph). We created a [thin command line 
> wrapper|https://github.com/SmartDataAnalytics/Sparqlintegrate] to 
> conveniently run these conversions.
> An example of these extension functions:
> {code:sql}
> # Add labels of train / bus stops
> INSERT {
>   GRAPH eg:result { ?s rdfs:label ?l }
> }
> WHERE {
>   # Magic property to fetch the text (at present always a string) of some URL
>   <someUrlPointingToALocalOrRemoteDataset> url:text ?src .
>   # Parse into a literal of JSON datatype
>   BIND(STRDT(?src, xsd:json) AS ?o)
>   # Access a JSON array attribute
>   BIND(json:path(?o, "$.stopNames") AS ?stopNames)
>   # Create bindings for each element in the JSON document
>   ?stopNames json:unnest (?l ?i) .
>   # An ordinary join with existing data
>   GRAPH ?x { ?s eg:stopId ?i }
> }
> {code}
> In fact, these SPARQL ARQ extensions would enable any Jena-based project to 
> perform such integration tasks - and for instance one could already start a 
> Fuseki  in order to play around with conversions in a Web interface.
> * Is there interest to integrate our ARQ [SPARQL extension 
> functions|https://github.com/SmartDataAnalytics/jena-sparql-api/tree/develop/jena-sparql-api-sparql-ext]
>  into Jena? If so, what would we have to do and where (which existing or new 
> jena module) would be the most appropriate place?
> We are also open to discussion and changes on what exactly the signatures of 
> these extension functions should look like. For instance, right now we use 
> two custom datatypes, xsd:json and xsd:xml which obviously should be replaced 
> by better IRIs.
> * Maybe the functionality of running files containing sequences of SPARQL 
> queries from the command line could also be added to Jena directly - as I 
> think there is no magic outside the scope of Jena to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (JENA-1502) SPARQL extensions for processing CSV, XML, JSON and remote data

Reply via email to