On Apr 14, 2014, at 5:22 PM, Jona Christopher Sahnwaldt 
<j...@sahnwaldt.de<mailto:j...@sahnwaldt.de>>
 wrote:



On Monday, 14 April 2014, Patel-Schneider, Peter 
<peter.patel-schnei...@nuance.com<mailto:peter.patel-schnei...@nuance.com>> 
wrote:

On Apr 14, 2014, at 12:14 PM, Dimitris Kontokostas 
<jimk...@gmail.com<javascript:_e(%7B%7D,'cvml','jimk...@gmail.com');>>
 wrote:



It is easier to have them [type links] in the db and one can programmatically 
remove them if needed. This way for example we can request all persons and the 
query is answered without reasoning

Unfortunately, it may not be possible to correctly do this removal correctly.  
Consider, for example, if the same bit of information in a Wikipedia entry ends 
up producing two different types.

Can't happen. Types are based on infoboxes. If a Wikipedia page contains two or 
more mapped infoboxes, we generate a new resource URI for each one. You can use 
a script (it's one line in awk) that goes through instance_types.nq line by 
line and only prints the first line for each resource.

Once you're done with that I'll admit that I lied to you. There are some cases 
(based on somewhat intricate rules) when we *don't* generate a new resource URI 
for a second (or third...) infobox. In this case, the script would drop the 
types for the other infoboxes. But there's help: it is very unlikely that 
multiple infoboxes appear in the same line in the Wikipedia source, and the 
provenance URI contains the line number, which means that different infoboxes 
are very unlikely to have the same provenance URI. So your script should print 
the first type for each combination of resource URI (first field) and 
provenance URI (fourth field). Still one line in awk. It's not 100% precise, 
but probably 99.9% or more.

I think this approach will work, but don't take my word for it. It's been a 
while since I worked on this stuff.




It appears that there are no cases in English DBpedia where two separate 
mappings are used with the same resource IRI.

At least this is what an

  fgrep owl#Thing instance_types_en.ttl | sort | uniq -d

suggests.


peter

PS:  Looking at intermediate results in this pipeline shows that DBpedia has 
some issues with newlines in resource names.

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to