On 15 April 2014 03:05, Patel-Schneider, Peter
<peter.patel-schnei...@nuance.com> wrote:
>
> On Apr 14, 2014, at 5:22 PM, Jona Christopher Sahnwaldt <j...@sahnwaldt.de>
>  wrote:
>
>
>
> On Monday, 14 April 2014, Patel-Schneider, Peter
> <peter.patel-schnei...@nuance.com> wrote:
>>
>>
>> On Apr 14, 2014, at 12:14 PM, Dimitris Kontokostas <jimk...@gmail.com>
>>  wrote:
>>
>>
>>
>>
>> On Mon, Apr 14, 2014 at 6:55 PM, Patel-Schneider, Peter
>> <peter.patel-schnei...@nuance.com> wrote:
>>>
>>> So each mapping has to explicitly state that the object belongs to
>>> owl:Thing?
>>
>>
>> No, you just need to specify to loweset subclass and the framework adds
>> all the superclasses / equivalent classes
>>
>>
>> That's what I was expecting, but I couldn't find anything that talked
>> about this.   What is the source of the superclasses and equivalentclasses?
>> The 3.9 ontology doesn't seem to match up with what is being added.  For
>> example http://www.wikidata.org/entity/Q215627 is equivalent to Person in
>> the 3.9 ontology, but it isn't added in the 3.9 dumps.
>
>
> That's not quite correct. The equivalent wikidata classes for Person were
> added to the mappings wiki in February 2014. DBpedia 3.9 is based on the
> contents of the mappings wiki as of ca. April 2013. That's why these
> equivalent classes are neither in the 3.9 dumps nor in the 3.9 ontology.
>
>
> Aaah, true.  What threw me off was that I was looking at several versions of
> the DBpedia ontology that I had grabbed off the web.  The 3.9 ontology  has
> versionInfo that says 3.8 (and I didn't have the version in the file name)
> so I looked for a version of the ontology where I did have 3.9 in the file
> name and I got the newer version (which also has versionInfo of 3.8, which I
> didn't notice).  I don't expect that it is worthwhile to go back and change
> the versionInfo for the 3.9 ontology, but it might be a good idea to give
> the current ontology a different version number, perhaps something like
> 3.9.1, and then change the ontology version number when 3.10 comes out.
>
>
> (By the way, Q215627 seems to be wrong: the Wikidata page says "don't use
> with instance-of". We should probably fix this on the mappings wiki.)
>
>>
>>
>>>
>>> And the mapping for Philosopher has to explictly state that the object
>>> belongs to Person, foaf:Person, schema.org:Person, Agent, and owl:Thing?
>>> That doesn't seem right.
>>>
>>> My guess is that there is some other bit of software that explicitly puts
>>> in these extra type links.
>>>
>>>
>>> One reason that I ask is that I would like to not have these extra type
>>> links, so that I can run some experiments.
>>
>>
>> It is easier to have them in the db and one can programmatically remove
>> them if needed. This way for example we can request all persons and the
>> query is answered without reasoning
>>
>>
>> Unfortunately, it may not be possible to correctly do this removal
>> correctly.  Consider, for example, if the same bit of information in a
>> Wikipedia entry ends up producing two different types.
>
>
> Can't happen. Types are based on infoboxes. If a Wikipedia page contains two
> or more mapped infoboxes, we generate a new resource URI for each one.
>
>
> Hmm.  This doesn't seem exhaustive.  There are three IRIs for
> <http://en.wikipedia.org/wiki/Abraham_Lincoln?oldid=548249630#absolute-line=5>,
> for example.  So it appears that a single infobox is producing multiple
> IRIs.

Correct.

> Perhaps you meant to add that if there are multiple mappings that
> trigger on an infobox that there is a new resource IRI generated for each
> one.

That's another special case.

http://dbpedia.org/resource/Abraham_Lincoln__1 and
http://dbpedia.org/resource/Abraham_Lincoln__2 are what we call
'intermediate nodes' (basically blank nodes). In this case, they are
used to group information about Lincoln's terms in office. The data is
extracted from the same infobox as the main stuff, and the infobox
starts in line 5, so they get the same provenance URI.

Depending on your needs, it might be nice to remove these intermediate
nodes completely. Unfortunately, that won't be easy based solely on
syntax. If there are multiple infoboxes on one page, we also generate
new URIs, but they often look exactly like the URIs generated for
intermediate nodes, i.e. with something like '__1' appended to the
main URI.


>
> | You can use a script (it's one line in awk) that goes through
> instance_types.nq line by line and only prints the first line for each
> resource.
>
> Hmm.  I would like to see some documentation that describes how this works,
> and that can be pointed to in published work.
>
>
> Once you're done with that I'll admit that I lied to you. There are some
> cases (based on somewhat intricate rules) when we *don't* generate a new
> resource URI for a second (or third...) infobox. In this case, the script
> would drop the types for the other infoboxes. But there's help: it is very
> unlikely that multiple infoboxes appear in the same line in the Wikipedia
> source, and the provenance URI contains the line number, which means that
> different infoboxes are very unlikely to have the same provenance URI. So
> your script should print the first type for each combination of resource URI
> (first field) and provenance URI (fourth field). Still one line in awk. It's
> not 100% precise, but probably 99.9% or more.
>
> I think this approach will work, but don't take my word for it. It's been a
> while since I worked on this stuff.
>
>
> Unfortunately, even 99.9% might not be good enough, if I end up comparing
> ontology-based approaches for data cleaning to other approaches.
>
>
>>
>>
>> Then removing what appears to be redundant type information may in fact
>> remove a separate source of information.
>>
>>
>>
>>>
>>> It would also have been nice to have the provenance information show
>>> which mapping was used.
>>
>>
>> We have some sort of provenance in the .nq files, like where in wikipage
>> this triple was extracted
>>
>>
>> Yes, I've seen this, but there is no information that I can see on what
>> mapping was used.
>
>
> Yes, that's a bummer.
>
>
> Particularly if one wants to investigate which mapping rules are causing
> inconsistencies.
>
>
> Maybe there should be a new DBpedia dataset that contains links from
> resource and class URIs to mappings wiki pages. It would be nice if users
> who browse pages like dbpedia.org/ontology/Person or
> dbpedia.org/resource/Bono could click on a link that takes them to the
> mappings wiki where they can improve things.
>
>
> JC
>
>
> peter
>
>
>
>>
>>
>>
>> Best,
>> Dimitris
>>
>>>
>>>
>>>
>>> peter
>>>
>>> PS:  [1] doesn't talk about how these extra type links are generated, as
>>> far as I can tell.  The only relevant portion of the paper is on page 4:  "A
>>> mapping assigns a type from the DBpedia ontology to the entities that are
>>> described by the corresponding infobox."
>>>
>>
>>
>> peter
>>
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to