On 15 April 2014 06:44, Patel-Schneider, Peter
<peter.patel-schnei...@nuance.com> wrote:
> Oops.  This doesn't catch the problematic situations, because the extra
> owl:Thing triple is not generated.
>
> It would be nice to have some documentation on just how the general type
> triples are added.

I'm afraid there is no better documentation than the code:
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala

>
> peter
>
>
> On Apr 14, 2014, at 6:32 PM, "Patel-Schneider, Peter"
> <peter.patel-schnei...@nuance.com> wrote:
>
>
> On Apr 14, 2014, at 5:22 PM, Jona Christopher Sahnwaldt <j...@sahnwaldt.de>
>  wrote:
>
>
>
> On Monday, 14 April 2014, Patel-Schneider, Peter
> <peter.patel-schnei...@nuance.com> wrote:
>>
>>
>> On Apr 14, 2014, at 12:14 PM, Dimitris Kontokostas <jimk...@gmail.com>
>>  wrote:
>>
>>
>>
>> It is easier to have them [type links] in the db and one can
>> programmatically remove them if needed. This way for example we can request
>> all persons and the query is answered without reasoning
>>
>>
>> Unfortunately, it may not be possible to correctly do this removal
>> correctly.  Consider, for example, if the same bit of information in a
>> Wikipedia entry ends up producing two different types.
>
>
> Can't happen. Types are based on infoboxes. If a Wikipedia page contains two
> or more mapped infoboxes, we generate a new resource URI for each one. You
> can use a script (it's one line in awk) that goes through instance_types.nq
> line by line and only prints the first line for each resource.
>
> Once you're done with that I'll admit that I lied to you. There are some
> cases (based on somewhat intricate rules) when we *don't* generate a new
> resource URI for a second (or third...) infobox. In this case, the script
> would drop the types for the other infoboxes. But there's help: it is very
> unlikely that multiple infoboxes appear in the same line in the Wikipedia
> source, and the provenance URI contains the line number, which means that
> different infoboxes are very unlikely to have the same provenance URI. So
> your script should print the first type for each combination of resource URI
> (first field) and provenance URI (fourth field). Still one line in awk. It's
> not 100% precise, but probably 99.9% or more.
>
> I think this approach will work, but don't take my word for it. It's been a
> while since I worked on this stuff.
>
>>
>
>
> It appears that there are no cases in English DBpedia where two separate
> mappings are used with the same resource IRI.
>
> At least this is what an
>
>   fgrep owl#Thing instance_types_en.ttl | sort | uniq -d
>
> suggests.
>
>
> peter
>
> PS:  Looking at intermediate results in this pipeline shows that DBpedia has
> some issues with newlines in resource names.
>
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to