Re: Creating JSON from RDF

Dave Reynolds Mon, 14 Dec 2009 09:00:26 -0800

Hi Jenni,

Jeni Tennison wrote:

On 13 Dec 2009, at 13:34, Dave Reynolds wrote:

I agree we want both graphs and SPARQL results but I think there isanother third case - lists of described objects.
I absolutely agree with you that lists of described objects is anessential part of an API. In fact, I was going to (and will!) write aseparate message about possible approaches for creating such lists.
It seemed to me that lists could be represented with RDF like:

  <http://statistics.data.gov.uk/doc/local-authority?page=1>
    rdfs:label "Local Authorities - Page 1" ;
    xhv:next <http://statistics.data.gov.uk/doc/local-authority?page=2> ;
    ...
    api:contents (
      <http://statistics.data.gov.uk/id/local-authority/00QA>
      <http://statistics.data.gov.uk/id/local-authority/00QB>
      <http://statistics.data.gov.uk/id/local-authority/45UB>
      ...
    )

This is just RDF, and as such any rules that we create about mapping RDFgraphs to JSON could apply. (I agree that the list page should includeextra information about the items in the list, but that seems to me tobe a separable issue.)

Sure but there are some advantages to treating this ordered list ofresults as an API issue rather than a modelling issue.


I'll respond properly on your other thread.

One thing it makes me think is that perhaps JSON Schema [1] could formthe basis of the mechanism for expressing any extra stuff that'srequired about the properties.


Interesting thought, I'll need to go learn more about JSON Schema first.

Note that the "$" is taken from RDFj. I'm not convinced it's a goodidea to use this symbol, rather than simply a property called "about"or "this" -- any opinions?
I'd prefer "id" (though "about" is OK), "$" is too heavily overused injavascript libraries.
I agree. From the brief survey of JSON APIs that I did just now, itseems as though prefixing a reserved property name with a '_' is theusual thing. I'd suggest '_about' because it's similar to RDFa andbecause '_id', to me at least, implies a local identifier rather than aURI.

No objection to "_about", as per separate thread it was Freebaseespecially that motivated the suggestion of "id".


[On api:mapping usage]

Are you thinking of this as something the publisher provides or theAPI caller provides?
If the former, then OK but as I say I think a zero config set ofdefault conventions is OK with the API to allow fine tuning.
I'm thinking of this as something that the publisher of the API creates(to describe/define the API). Note, though, that the publisher of theAPI might not be the publisher of the data, and that it could feasiblybe possible for there to be a service that would allow clients to supplya configuration, point at a datastore, and have the API just work.

OK, agreed. My concern is that developers shouldn't have to wade throughthis mapping to understand what they are getting, unless they arealready RDF heads and care about that aspect.


[On multi-valued properties]

I guess there are two choices if there was no specification:
1. always give one value for the property; if there are several valuesin the graph, then provide "the first"2. give an array when there are multiple values and a singleton whenthere's only one
I did have another vague notion of providing two properties side byside, one singular and one plural, so you would have:
  {
    "nick": "JeniT"
  }

or

  {
    "nicks": ["wilding", "wilda"]
  }
side by side in the same list of objects. But of course that wouldrequire configuration anyway (to provide pluralised versions of thelabel), so I'm not particularly taken with it.
It does concern me that if there are RDF graphs which containdescriptions of several resources of the same type, we might get into asituation where there are two resources for which the default behaviourwould be different; we need to have a way of reconciling this (forexample, if any of the resources in the graph have multiple values for aproperty, then it always uses an array).

Yes. With zero configuration there will always either be someinconsistency or you have to force the more general convention onpeople. I agree with Mark that developers can write code to adapt to thelist/no-list case and with configuration we have the option to make thismore consistent in places where this is a problem.

One possibility is a bootstrapping service where you give sample dataand ontology, if available, and get back suggested mapping. That can dothe scanning of data to guess at multi-valuedness once so you don't paythe cost of doing that in the live API.

[snip]
Language codes are effectively open ended. I can't necessarily predictwhat lang codes are going to be in my data and provide a propertymapping for every single one.
I know they're *potentially* open-ended; I think in practice, for asingle API, they are probably not.

Depends on whether this is your own data or you are harvesting/receivingfrom multiple other sources and passing it on (in which case you have alot less control).

And even in the case of data thatdoes have multiple languages (eg DBPedia) it would be possible to createa list based on the IANA language subtag registry [2] if you wereconcerned.

You could but from the client's point of view trying all those propertynames in order to find a value it can use is going to be awkward.

Plus when working with language-tagged data you often have code to doa "best match" (not simple lookup) between the user's languagepreferences and the available lang tags. That looks hard if each is ina different property and the lang tags themselves are hidden in theAPI configuration.
I think we may need the long winded encoding available:

{
 "id" : "http://statistics.data.gov.uk/id/local-authority-district/00PB";,
 "prefLabel" : [
   "The County Borough of Bridgend",
   { "value" : "The County Borough of Bridgend", "lang" : "en" },
   { "value" : "Pen-y-bont ar Ogwr", "lang : "cy" }
 ]
 ...
Then it would up to the publisher whether provide the simplerproperties as well or instead. But those could be regard astransformations of the RDF for convenience (much like choosing toinclude RDFS closure info).
As I say, I'm not convinced that this is a big enough issue to sweatover, but another possibility would be to perform some basic stringmanipulation to create separate properties as required. For example:
 {
"_about" :"http://statistics.data.gov.uk/id/local-authority-district/00PB";,
   "prefLabel": "The County Borough of Bridgend",
   "prefLabel_en": "The County Borough of Bridgend",
   "prefLabel_cy": "Pen-y-bont ar Ogwr"
 }
Note that the language of the value of the property without the languagesuffix is probably something that you'd want in the API configuration(and possibly overridable by the client).

Yes that is better though I think Mark's literal encoding would beeasier to work with than the encoding in the property name.

For things like xsd:dateTime then there seems a couple of options. TheSimile type option would be to have them as strings but define therange of the property in some associated context/properties table.
The other would be to use a structured representation:

 {
     "id" : "http://example.com/ourpaper";,
     "date" : { "type" : date, "value" : "20091312"}
    ...
I'm guessing you would just have them as strings and let the consumerfigure out when they want to treat them as dates, is that right?
That would be my preference, but I think the strings should(unfortunately) use formats understood by the Javascript Date.parse()method [3]. So the above would be:
  {
    "_about": "http://example.com/ourpaper";,
    "date": "13 Dec, 2009"
  }


Ugh. I guess you are right, hadn't thought of that.

Cheers,
Dave

Re: Creating JSON from RDF

Reply via email to