[Dbpedia-discussion] Data returned on dbpedia.org/ontology/

Adrian Gschwend Wed, 30 May 2012 03:04:35 -0700

Hi group,

We work on some software which heavily relies on the ontologies used by
the data. This means we dereference the ontologies used on data sets and
do some inference to figure out additional stuff about the data. For
most ontologies this works pretty well.


Last week we were test driving our software against some data at
DBPedia, namely the page of Tim Berners-Lee at
http://dbpedia.org/resource/Tim_Berners-Lee

So far so good, in there we have several rdf:type definitions, including
dbpedia-owl:Person, which points to http://dbpedia.org/ontology/Person

On that point we noticed that it took way too long to get the page,
cache it and do some stuff on it. So we started analyzing it and did it
by hand:

    % curl -I -H "Accept: application/rdf+xml"
http://dbpedia.org/ontology/Person

    HTTP/1.1 303 See Other
    Date: Mon, 21 May 2012 19:00:08 GMT
    Content-Type: application/rdf+xml
    Connection: keep-alive
    Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc25-64  VDB
    Accept-Ranges: bytes
    Location: http://dbpedia.org/data3/Person.rdf
    Content-Length: 0

Not a problem, the system can handle redirects. So we get the other file
instead. And boy were we confused: It returns an 8MB file for the
request (which took quite some time to get btw) After analyzing it in
rapper I figured out that we got about 50'000 triples, probably less
than 20 are really related to the ontology and the rest is stuff like:

<http://dbpedia.org/resource/Zygmunt_Balicki>
    a <http://dbpedia.org/ontology/Person> .

While I do see that this "reverse property" or however it is called
might be interesting when I browse the data set in my web browser it is
in my opinion plain wrong to return it on the URI which dereferences the
ontology.

Our software is also targeted at smart phones, you can imagine that it
is not really fun to get 50'000 triples back on a crappy 3G link with
volume limits and then parse and cache them on a device which is running
on battery power. If I do that on several dbpedia data sets I'm probably
out of power very soon and didn't even get half of the ontologies used
in the data.

What is your opinion on that? Is there a good reason for this or did you
just think it might be useful? As you can see this pretty much kills the
way we use ontologies and I think the "classical" way to dereference
ontologies makes way more sense, so I would vote to change this behavior
on dbpedia and return uniquely the definition itself.


thanks

cu

Adrian

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Data returned on dbpedia.org/ontology/

Reply via email to