Hi group, We work on some software which heavily relies on the ontologies used by the data. This means we dereference the ontologies used on data sets and do some inference to figure out additional stuff about the data. For most ontologies this works pretty well.
Last week we were test driving our software against some data at DBPedia, namely the page of Tim Berners-Lee at http://dbpedia.org/resource/Tim_Berners-Lee So far so good, in there we have several rdf:type definitions, including dbpedia-owl:Person, which points to http://dbpedia.org/ontology/Person On that point we noticed that it took way too long to get the page, cache it and do some stuff on it. So we started analyzing it and did it by hand: % curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/ontology/Person HTTP/1.1 303 See Other Date: Mon, 21 May 2012 19:00:08 GMT Content-Type: application/rdf+xml Connection: keep-alive Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc25-64 VDB Accept-Ranges: bytes Location: http://dbpedia.org/data3/Person.rdf Content-Length: 0 Not a problem, the system can handle redirects. So we get the other file instead. And boy were we confused: It returns an 8MB file for the request (which took quite some time to get btw) After analyzing it in rapper I figured out that we got about 50'000 triples, probably less than 20 are really related to the ontology and the rest is stuff like: <http://dbpedia.org/resource/Zygmunt_Balicki> a <http://dbpedia.org/ontology/Person> . While I do see that this "reverse property" or however it is called might be interesting when I browse the data set in my web browser it is in my opinion plain wrong to return it on the URI which dereferences the ontology. Our software is also targeted at smart phones, you can imagine that it is not really fun to get 50'000 triples back on a crappy 3G link with volume limits and then parse and cache them on a device which is running on battery power. If I do that on several dbpedia data sets I'm probably out of power very soon and didn't even get half of the ontologies used in the data. What is your opinion on that? Is there a good reason for this or did you just think it might be useful? As you can see this pretty much kills the way we use ontologies and I think the "classical" way to dereference ontologies makes way more sense, so I would vote to change this behavior on dbpedia and return uniquely the definition itself. thanks cu Adrian ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion