I've been poking about a bit and found the following things:

    (1) The browser at 
http://mappings.dbpedia.org/server/ontology/classes doesn't work with 
Google's Chrome.  Works OK for me in Firefox

    (2) I'd like to see more definite comments for the items.  For 
instance,  I'd like to see something in the definition of 'City' that 
gives a specific answer to the Tokyo and London question.  If this isn't 
stated somewhere,  things are either going to be random or we'll be 
having edit wars.  Personally I'd be willing to put my opinion in 
there,  but I'd like to see some process as to how this gets done.

    (3) There is no one simple reason for why the city assignments get 
lost because the infobox mappings are pretty complicated.  For 
instance,  places like Manchester NH,  NYC and Sao Paulo have an 
Infobox:Settlement,  and the "City" designation should be triggered by

settlement_type = City

I noticed however,  that Manchester has

settlement_type = [[City]]

which is "reasonable" (certainly reflects linked data thinking) but I 
don't know if the extractor is going to get that.  On the other hand,  
if you look at the entry for Dresden,  Dresden has 
Infobox:German_Location and the "Citiness" of Dresden is triggered by 
the line

Art = City

in the infobox.  There's also an Infobox for Japanese_City,  so I'm sure 
that there are a lot of details.

    (4) If there's a root cause for the problem,  it's that there isn't 
a closed feedback loop.  If you're looking at this as a problem of 
"transforming something from form A to form B" it's clear that the 
system produces "B".  It's only when you actually try to use "B" that 
you find that "B" is full of holes.  Overall it's a system problem:  I'm 
sure that we can get better results by changing the extractor rules (in 
fact,  we'll get the fastest gains this way) but that some changes to 
Wikipedia content be necessary too.

    The complexity of the infoboxes means that an agent that does these 
corrections could be a bit complex,  although its behavior could 
probably be controlled by the infobox mappings.  For instance,  it may 
end up doing something a bit different for a "German Location" than it 
would for a "Settlement".

    Along the way it's also tempting to do some canonicalization.  For 
instance,  the word "City" in the infobox header for NYC is just plain 
text,  but the word "City" for Manchester NH is a hyperlink.  You can 
make a case for both,  but from a quality standpoint,  the same thing 
should be done in both cases.

    In the case of the German locations I see that the English words 
"Town" and "City" are often used in the "type" and "art" fields,  but 
the word "Stadt" is treated by the framework as if were synonymous with 
"Town",  which,  from what little German I know,  isn't quite right 
(isn't Munich a /Großstadt/?) .  But perhaps the word "Stadt" has some 
special semantics in the context of Wikipedia,  and it ought to be 
preserved -- Ultimately it seems that wikipedia ought to make up it's mind.

    Overall,  correcting wikipedia is going to involve dealing with 
entropy,  dealing with politics,  and probably the careful re-injection 
of entropy to satisfy political constraints.

    (5) I can think of a lot of toolage that would be useful here.  For 
instance,  it would be nice to be able to look at "City" and get a list 
of rules that would cause something to be identified as a "City".  If I 
go down this path far enough,  I'm probably going to buy a new (wicked 
fast) hard drive,  install the extractor framework,  and want to get 
justifications about why the system made the assignments that it did and 
progressively identify the causes of misidentifications.

    It seems like the first thing I ought to do is make a list of cities 
that aren't identified in DBPedia,  and then the next stage is to work 
down that list and find the problems

    (6) I'm also interested in a mapping between wikipedia ontology 
concepts and dbpedia pages...  For instance,  there's

http://en.wikipedia.org/wiki/City

    Practically,  if I'm building a site that uses the dbpedia ontology 
(or something similar) I'm going to want to have user-friendly pages 
that have something to say about the taxonomic classes that the site uses.

    (7) Along those lines,  dbpedia-owl:Building really drives me nuts.  
As Wikipedia puts it,

   1. Any human-made structure used or intended for supporting or
      sheltering any use or continuous occupancy </wiki/Occupancy>, or
   2. An act of construction </wiki/Construction> (i.e. the activity of
      building, see also builder </wiki/Builder>)

    dbpedia-owl:Building has a number of subclasses under it which don't 
match the vernacular meaning of the word "Building",  which is meaning 
#1.  Practically,  I'd say that a building provides an environmental 
shell,  and would not include

Airport, Bridge, and LaunchPad

    I would include the Vehicle Assembly Building at Cape Kennedy as a 
Building,  however,  since that provides an environmental shell.  I'd 
say that a Barn or even a 3-sided run-in shed is a "Building" because 
there's a full or partial environmental shell,  and that Stations and 
Stadiums are generally buildings,  because they are human-inhabited and 
provide at least a partial environmental shell.

    I wouldn't mind using the word "Structure" for what "Building" is 
now,  and I'd probably want to move "Monument" under it as well.

    Note "Environmental shell" is an issue with monuments too.  It's 
possible to go inside the Statue Of Liberty,  but it's a special tour 
and takes some effort to do.  Does that make the Statue of Liberty a 
building?  The Gateway Arch,  Eiffel Tower and Tokyo tower probably all 
fall into the "Building" category because they all have a high level of 
accommodation for visitors)

    Specifically,  the issue I've got is that ordinary users are going 
to have a hard time with the statement that "a Bridge is a Building";  
for projects like ny-pictures.com I really need some category that 
corresponds to the vernacular use of the word "Building" and that avoids 
things that look "crazy"




------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to