Creating JSON from RDF

Jeni Tennison Sat, 12 Dec 2009 13:44:32 -0800

Hi,

As part of the linked data work the UK government is doing, we'relooking at how to use the linked data that we have as the basis ofAPIs that are readily usable by developers who really don't want tolearn about RDF or SPARQL.

One thing that we want to do is provide JSON representations of bothRDF graphs and SPARQL results. I wanted to run some ideas past thisgroup as to how we might do that.

To put this in context, what I think we should aim for is a purepublishing format that is optimised for approachability for normaldevelopers, *not* an interchange format. RDF/JSON [1] and the SPARQLresults JSON format [2] aren't entirely satisfactory as far as I'mconcerned because of the way the objects of statements are representedas JSON objects rather than as simple values. I still think we shouldproduce them (to wean people on to, and for those using more generictools), but I'd like to think about producing something that is a bitmore immediately approachable too.

RDFj [3] is closer to what I think is needed here. However, I don'tthink there's a need for setting 'context' given I'm not aiming for aninterchange format, there are no clear rules about how to generate itfrom an arbitrary graph (basically there can't be without someadditional configuration) and it's not clear how to deal withdatatypes or languages.

I suppose my first question is whether there are any other JSON-basedformats that we should be aware of, that we could use or borrow ideasfrom?

Assuming there aren't, I wanted to discuss what generic rules we mightuse, where configuration is necessary and how the configuration mightbe done.


# RDF Graphs #

Let's take as an example:

  <http://www.w3.org/TR/rdf-syntax-grammar>
    dc:title "RDF/XML Syntax Specification (Revised)" ;
    ex:editor [
      ex:fullName "Dave Beckett" ;
      ex:homePage <http://purl.org/net/dajobe/> ;
    ] .

In JSON, I think we'd like to create something like:

  {
    "$": "http://www.w3.org/TR/rdf-syntax-grammar";,
    "title": "RDF/XML Syntax Specification (Revised)",
    "editor": {
      "name": "Dave Beckett",
      "homepage": "http://purl.org/net/dajobe/";
    }
  }

Note that the "$" is taken from RDFj. I'm not convinced it's a goodidea to use this symbol, rather than simply a property called "about"or "this" -- any opinions?

Also note that I've made no distinction in the above between a URI anda literal, while RDFj uses <>s around literals. My feeling is thatnormal developers really don't care about the distinction between aURI literal and a pointer to a resource, and that they will base thetreatment of the value of a property on the (name of) the propertyitself.

So, the first piece of configuration that I think we need here is tomap properties on to short names that make good JSON identifiers (iename tokens without hyphens). Given that properties normally havelowercaseCamelCase local names, it should be possible to use that as adefault. If you need something more readable, though, it seems like itshould be possible to use a property of the property, such as:


  ex:fullName api:jsonName "name" .
  ex:homePage api:jsonName "homepage" .

However, in any particular graph, there may be properties that havebeen given the same JSON name (or, even more probably, local name). Wecould provide multiple alternative names that could be chosen between,but any mapping to JSON is going to need to give consistent resultsacross a given dataset for people to rely on it as an API, and thatmeans the mapping can't be based on what's present in the data. Wecould do something with prefixes, but I have a strong aversion toassuming global prefixes.

So I think this means that we need to provide configuration at an APIlevel rather than at a global level: something that can be usedconsistently across a particular API to determine the token that'sused for a given property. For example:


  <> a api:JSON ;
    api:mapping [
      api:property ex:fullName ;
      api:name "name" ;
    ] , [
      api:property ex:homePage ;
      api:name "homepage" ;
    ] .

There are four more areas where I think there's configuration we needto think about:


  * multi-valued properties
  * typed and language-specific values
  * nesting objects
  * suppressing properties

## Multi-valued Properties ##

First one first. It seems obvious that if you have a property withmultiple values, it should turn into a JSON array structure. Forexample:


  [] foaf:name "Anna Wilder" ;
    foaf:nick "wilding", "wilda" ;
    foaf:homepage <http://example.org/about> .

should become something like:

  {
    "name": "Anna Wilder",
    "nick": [ "wilding", "wilda" ],
    "homepage": "http://example.org/about";
  }

The trouble is that if you determine whether something is an array ornot based on the data that is actually available, you'll getsituations where the value of a particular JSON property is sometimesan array and sometimes a string; that's bad for predictability for thepeople using the API. (RDF/JSON solves this by every value being anarray, but that's counter-intuitive for normal developers.)

So I think a second API-level configuration that needs to be made isto indicate which properties should be arrays and which not:


  <> a api:API ;
    api:mapping [
      api:property foaf:nick ;
      api:name "nick" ;
      api:array true ;
    ] .

## Typed Values and Languages ##

Typed values and values with languages are really the same problem. Ifwe have something like:


  <http://statistics.data.gov.uk/id/local-authority-district/00PB>
    skos:prefLabel "The County Borough of Bridgend"@en ;
    skos:prefLabel "Pen-y-bont ar Ogwr"@cy ;
    skos:notation "00PB"^^geo:StandardCode ;
    skos:notation "6405"^^transport:LocalAuthorityCode .

then we'd really want the JSON to look something like:

  {

"$": "http://statistics.data.gov.uk/id/local-authority-district/00PB",

    "name": "The County Borough of Bridgend",
    "welshName": "Pen-y-bont ar Ogwr",
    "onsCode": "00PB",
    "dftCode": "6405"
  }

I think that for this to work, the configuration needs to be able tofilter values based on language or datatype to determine the JSONproperty name. Something like:


  <> a api:JSON ;
    api:mapping [
      api:property skos:prefLabel ;
      api:lang "en" ;
      api:name "name" ;
    ] , [
      api:property skos:prefLabel ;
      api:lang "cy" ;
      api:name "welshName" ;
    ] , [
      api:property skos:notation ;
      api:datatype geo:StandardCode ;
      api:name "onsCode" ;
    ] , [
      api:property skos:notation ;
      api:datatype transport:LocalAuthorityCode ;
      api:name "dftCode" ;
    ] .

## Nesting Objects ##

Regarding nested objects, I'm again inclined to view this as aconfiguration option rather than something that is based on theavailable data. For example, if we have:


  <http://example.org/about>
    dc:title "Anna's Homepage"@en ;
    foaf:maker <http://example.org/anna> .

  <http://example.org/anna>
    foaf:name "Anna Wilder" ;
    foaf:homepage <http://example.org/about> .

this could be expressed in JSON as either:

  {
    "$": "http://example.org/about";,
    "title": "Anna's Homepage",
    "maker": {
      "$": "http://example.org/anna";,
      "name": "Anna Wilder",
      "homepage": "http://example.org/about";
    }
  }

or:

  {
    "$": "http://example.org/anna";,
    "name": "Anna Wilder",
    "homepage": {
      "$": "http://example.org/about";,
      "title": "Anna's Homepage",
      "maker": "http://example.org/anna";
    }
  }

The one that's required could be indicated through the configuration,for example:


  <> a api:API ;
    api:mapping [
      api:property foaf:maker ;
      api:name "maker" ;
      api:embed true ;
    ] .

The final thought that I had for representing RDF graphs as JSON wasabout suppressing properties. Basically I'm thinking that thisconfiguration should work on any graph, most likely one generated froma DESCRIBE query. That being the case, it's likely that there will beproperties that repeat information (because, for example, they are asuper-property of another property). It will make a cleaner JSON APIif those repeated properties aren't included. So something like:


  <> a api:API ;
    api:mapping [
      api:property admingeo:contains ;
      api:ignore true ;
    ] .

# SPARQL Results #

I'm inclined to think that creating JSON representations of SPARQLresults that are acceptable to normal developers is less importantthan creating JSON representations of RDF graphs, for two reasons:

1. SPARQL naturally gives short, usable, names to the properties inJSON objects2. You have to be using SPARQL to create them anyway, and if you'redoing that then you can probably grok the extra complexity of havingvalues that are objects

Nevertheless, there are two things that could be done to simplify theSPARQL results format for normal developers.

One would be to just return an array of the results, rather than anobject that contains a results property that contains an object with abindings property that contains an array of the results. People whowant metadata can always request the standard SPARQL results JSONformat.

The second would be to always return simple values rather thanobjects. For example, rather than:


  {
    "head": {
      "vars": [ "book", "title" ]
    },
    "results": {
      "bindings": [
        {
          "book": {
            "type": "uri",
            "value": "http://example.org/book/book6";
          },
          "title": {
            "type": "literal",
            "value", "Harry Potter and the Half-Blood Prince"
          }
        },
        {
          "book": {
            "type": "uri",
            "value": "http://example.org/book/book5";
          },
          "title": {
            "type": "literal",
            "value": "Harry Potter and the Order of the Phoenix"
          }
        },
        ...
      ]
    }
  }

a normal developer would want to just get:

  [{
    "book": "http://example.org/book/book6";,
    "title": "Harry Potter and the Half-Blood Prince"
   },{
     "book": "http://example.org/book/book5";,
     "title": "Harry Potter and the Order of the Phoenix"
   },
   ...
  ]

I don't think we can do any configuration here. It means thatinformation about datatypes and languages isn't visible, but (a) I'mpretty sure that 80% of the time that doesn't matter, (b) there'salways the full JSON version if people need it and (c) they couldwrite SPARQL queries that used the datatype/language to populatedifferent variables/properties if they wanted to.

So there you are. I'd really welcome any thoughts or pointers aboutany of this: things I've missed, vocabularies we could reuse, thingsthat you've already done along these lines, and so on. Reasons whynone of this is necessary are fine too, but I'll warn you in advancethat I'm unlikely to be convinced ;)

Thanks,
Jeni

[1]: http://n2.talis.com/wiki/RDF_JSON_Specification
[2]: http://www.w3.org/TR/rdf-sparql-json-res/
[3]: http://code.google.com/p/ubiquity-rdfa/wiki/Rdfj
--
Jeni Tennison
http://www.jenitennison.com

Creating JSON from RDF

Reply via email to