On Apr 14, 2014, at 5:22 PM, Jona Christopher Sahnwaldt
<j...@sahnwaldt.de<mailto:j...@sahnwaldt.de>>
wrote:
On Monday, 14 April 2014, Patel-Schneider, Peter
<peter.patel-schnei...@nuance.com<mailto:peter.patel-schnei...@nuance.com>>
wrote:
On Apr 14, 2014, at 12:14 PM, Dimitris Kontokostas
<jimk...@gmail.com<javascript:_e(%7B%7D,'cvml','jimk...@gmail.com');>>
wrote:
It is easier to have them [type links] in the db and one can programmatically
remove them if needed. This way for example we can request all persons and the
query is answered without reasoning
Unfortunately, it may not be possible to correctly do this removal correctly.
Consider, for example, if the same bit of information in a Wikipedia entry ends
up producing two different types.
Can't happen. Types are based on infoboxes. If a Wikipedia page contains two or
more mapped infoboxes, we generate a new resource URI for each one. You can use
a script (it's one line in awk) that goes through instance_types.nq line by
line and only prints the first line for each resource.
Once you're done with that I'll admit that I lied to you. There are some cases
(based on somewhat intricate rules) when we *don't* generate a new resource URI
for a second (or third...) infobox. In this case, the script would drop the
types for the other infoboxes. But there's help: it is very unlikely that
multiple infoboxes appear in the same line in the Wikipedia source, and the
provenance URI contains the line number, which means that different infoboxes
are very unlikely to have the same provenance URI. So your script should print
the first type for each combination of resource URI (first field) and
provenance URI (fourth field). Still one line in awk. It's not 100% precise,
but probably 99.9% or more.
I think this approach will work, but don't take my word for it. It's been a
while since I worked on this stuff.
It appears that there are no cases in English DBpedia where two separate
mappings are used with the same resource IRI.
At least this is what an
fgrep owl#Thing instance_types_en.ttl | sort | uniq -d
suggests.
peter
PS: Looking at intermediate results in this pipeline shows that DBpedia has
some issues with newlines in resource names.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion