Hi.

I have argued for a long time that the linkage data (in particular owl:sameAs 
and similar links) should not usually be mixed with the knowledge being 
published.

Thus, for example as I discussed with Evan for the NYTimes site a while ago, it 
is not a good thing to put the owl:sameAs links (which were produced by a 
relatively unskilled individual over a short period of time) at the same status 
as the other data, which has been curated over decades by expert reporters.

These sameAs links have potentially very different trust,  provenance, licence, 
and possibly other non-functional attributes from the substantive data.
Clearly they have different trust and provenance, but licence may well be 
different, as the NYT may want people to take the triples away to bring traffic 
to their site, while keeping the other triples under more restricted licence.

Which brings me to an example of where things have recently gone badly wrong.
I have reported a bug to the dbpedia team wherein the URIs for countries have 
become deeply intertwingled.
Example queries are at the end of this message - they have to explicitly do the 
owl:sameAs because the store does not do owl:sameAs inference, but the outcome 
is that I can validly infer answers such as "Maseru is the capital of Belgium".

Of course, mistakes happen, so I am not having a specific go at dbpedia, which 
I still think is wonderful.

But the outcome is that I get very bad data from dbpedia.org unexpectedly, 
which means I (and presumably anyone else) can't reliably use dbpedia.org at 
all (because I use an inference engine when I cache the data).
Had the dbpedia.org site simply stuck to the behaviour I was sort of expecting 
of publishing data from wikipedia (possibly publishing the linkage data 
elsewhere) I would have been in a better position.

One of the issues here is to realise when we are actually adding knowledge to a 
triplication process.
It is clear when things like owl:sameAs are added that knowledge is being added.
However, people probably consider it less clear if URIs from dbpedia or 
elsewhere are directly used that they are adding their own knowledge.
In a similar way, such use introduces knowledge which may have very different 
trust and provenance from the data being triplified.

Is this a good way to do things?

I would say not.
I have used a wide variety of Linked Data sources, and have found problems with 
almost every one of them (possibly every significant one).
The problems frequently relate to the extra knowledge that the triplication 
process has introduced.
If only I could be given the data without, then I would not have to reject the 
dataset.

Thanks for reading this far.
Best
Hugh

Query:
SELECT DISTINCT ?capital WHERE {
 ?s owl:sameAs <http://dbpedia.org/resource/Belgium> .
 ?s owl:sameAs ?country .
 ?country <http://dbpedia.org/ontology/capital> ?capital .
}

As a URI:
http://dbpedia.org/snorql/?query=SELECT+DISTINCT+%3Fcapital+WHERE+%7B%0D%0A+%3Fs+owl%3AsameAs+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FBelgium%3E+.%0D%0A+%3Fs+owl%3AsameAs+%3Fcountry+.%0D%0A+%3Fcountry+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2Fcapital%3E+%3Fcapital+.%0D%0A%7D%0D%0A

Output:
capital
http://dbpedia.org/resource/City_of_Brussels
http://dbpedia.org/resource/Maseru


-- 
Hugh Glaser,  
              Web and Internet Science
              Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/



Reply via email to