Hi Jenni,
Jeni Tennison wrote:
On 13 Dec 2009, at 13:34, Dave Reynolds wrote:
I agree we want both graphs and SPARQL results but I think there is
another third case - lists of described objects.
I absolutely agree with you that lists of described objects is an
essential part of an API. In fact, I was going to (and will!) write a
separate message about possible approaches for creating such lists.
It seemed to me that lists could be represented with RDF like:
<http://statistics.data.gov.uk/doc/local-authority?page=1>
rdfs:label "Local Authorities - Page 1" ;
xhv:next <http://statistics.data.gov.uk/doc/local-authority?page=2> ;
...
api:contents (
<http://statistics.data.gov.uk/id/local-authority/00QA>
<http://statistics.data.gov.uk/id/local-authority/00QB>
<http://statistics.data.gov.uk/id/local-authority/45UB>
...
)
This is just RDF, and as such any rules that we create about mapping RDF
graphs to JSON could apply. (I agree that the list page should include
extra information about the items in the list, but that seems to me to
be a separable issue.)
Sure but there are some advantages to treating this ordered list of
results as an API issue rather than a modelling issue.
I'll respond properly on your other thread.
One thing it makes me think is that perhaps JSON Schema [1] could form
the basis of the mechanism for expressing any extra stuff that's
required about the properties.
Interesting thought, I'll need to go learn more about JSON Schema first.
Note that the "$" is taken from RDFj. I'm not convinced it's a good
idea to use this symbol, rather than simply a property called "about"
or "this" -- any opinions?
I'd prefer "id" (though "about" is OK), "$" is too heavily overused in
javascript libraries.
I agree. From the brief survey of JSON APIs that I did just now, it
seems as though prefixing a reserved property name with a '_' is the
usual thing. I'd suggest '_about' because it's similar to RDFa and
because '_id', to me at least, implies a local identifier rather than a
URI.
No objection to "_about", as per separate thread it was Freebase
especially that motivated the suggestion of "id".
[On api:mapping usage]
Are you thinking of this as something the publisher provides or the
API caller provides?
If the former, then OK but as I say I think a zero config set of
default conventions is OK with the API to allow fine tuning.
I'm thinking of this as something that the publisher of the API creates
(to describe/define the API). Note, though, that the publisher of the
API might not be the publisher of the data, and that it could feasibly
be possible for there to be a service that would allow clients to supply
a configuration, point at a datastore, and have the API just work.
OK, agreed. My concern is that developers shouldn't have to wade through
this mapping to understand what they are getting, unless they are
already RDF heads and care about that aspect.
[On multi-valued properties]
I guess there are two choices if there was no specification:
1. always give one value for the property; if there are several values
in the graph, then provide "the first"
2. give an array when there are multiple values and a singleton when
there's only one
I did have another vague notion of providing two properties side by
side, one singular and one plural, so you would have:
{
"nick": "JeniT"
}
or
{
"nicks": ["wilding", "wilda"]
}
side by side in the same list of objects. But of course that would
require configuration anyway (to provide pluralised versions of the
label), so I'm not particularly taken with it.
It does concern me that if there are RDF graphs which contain
descriptions of several resources of the same type, we might get into a
situation where there are two resources for which the default behaviour
would be different; we need to have a way of reconciling this (for
example, if any of the resources in the graph have multiple values for a
property, then it always uses an array).
Yes. With zero configuration there will always either be some
inconsistency or you have to force the more general convention on
people. I agree with Mark that developers can write code to adapt to the
list/no-list case and with configuration we have the option to make this
more consistent in places where this is a problem.
One possibility is a bootstrapping service where you give sample data
and ontology, if available, and get back suggested mapping. That can do
the scanning of data to guess at multi-valuedness once so you don't pay
the cost of doing that in the live API.
[snip]
Language codes are effectively open ended. I can't necessarily predict
what lang codes are going to be in my data and provide a property
mapping for every single one.
I know they're *potentially* open-ended; I think in practice, for a
single API, they are probably not.
Depends on whether this is your own data or you are harvesting/receiving
from multiple other sources and passing it on (in which case you have a
lot less control).
And even in the case of data that
does have multiple languages (eg DBPedia) it would be possible to create
a list based on the IANA language subtag registry [2] if you were
concerned.
You could but from the client's point of view trying all those property
names in order to find a value it can use is going to be awkward.
Plus when working with language-tagged data you often have code to do
a "best match" (not simple lookup) between the user's language
preferences and the available lang tags. That looks hard if each is in
a different property and the lang tags themselves are hidden in the
API configuration.
I think we may need the long winded encoding available:
{
"id" : "http://statistics.data.gov.uk/id/local-authority-district/00PB",
"prefLabel" : [
"The County Borough of Bridgend",
{ "value" : "The County Borough of Bridgend", "lang" : "en" },
{ "value" : "Pen-y-bont ar Ogwr", "lang : "cy" }
]
...
Then it would up to the publisher whether provide the simpler
properties as well or instead. But those could be regard as
transformations of the RDF for convenience (much like choosing to
include RDFS closure info).
As I say, I'm not convinced that this is a big enough issue to sweat
over, but another possibility would be to perform some basic string
manipulation to create separate properties as required. For example:
{
"_about" :
"http://statistics.data.gov.uk/id/local-authority-district/00PB",
"prefLabel": "The County Borough of Bridgend",
"prefLabel_en": "The County Borough of Bridgend",
"prefLabel_cy": "Pen-y-bont ar Ogwr"
}
Note that the language of the value of the property without the language
suffix is probably something that you'd want in the API configuration
(and possibly overridable by the client).
Yes that is better though I think Mark's literal encoding would be
easier to work with than the encoding in the property name.
For things like xsd:dateTime then there seems a couple of options. The
Simile type option would be to have them as strings but define the
range of the property in some associated context/properties table.
The other would be to use a structured representation:
{
"id" : "http://example.com/ourpaper",
"date" : { "type" : date, "value" : "20091312"}
...
I'm guessing you would just have them as strings and let the consumer
figure out when they want to treat them as dates, is that right?
That would be my preference, but I think the strings should
(unfortunately) use formats understood by the Javascript Date.parse()
method [3]. So the above would be:
{
"_about": "http://example.com/ourpaper",
"date": "13 Dec, 2009"
}
Ugh. I guess you are right, hadn't thought of that.
Cheers,
Dave