Hi Paul,

thanks a lot for your very insightful experience report about Semantic Web, RDF and DBPedia.

(more thoughts inline)

Am 02.07.2010 17:07, schrieb Paul Houle:
Here are some of my thoughts


[skip]


(4) I'm one of the people who got interested in semantic tech because of
DBPedia,  but yet,  I've also largely given up on DBPedia.  One day I
realized that I could,  with Freebase,  do things in 20 minutes that
would take 2 weeks of data cleanup with DBPedia.  DBPedia 3.5/3.5.1
seems to be a large step backwards,  with major key integrity problems
that are completely invisible to 'open world' and OWL-paradigm systems.
  I've wound up writing my own framework for extracting 'facts' from
wikipedia because DBPedia isn't interested in extracting the things I
want.  Every time I try to do something with DBpedia,  I make shocking
discoveries (for instance, "New York City", "Berlin", "Tokyo",
"Washington , D.C." and "Manchester, N.H." are not of rdf:type "City")
  The fact that I see so little complaining about this on the mailing
list seems to indicate that not a lot of people are trying to do real
work it.

I ask me all the time, why DBPedia (and now also Uberblic) uses its own (very huge) ontology specification in the background. Of course, they sometimes re-use some pieces of (well-established) ontology specifications. However, I think this pattern should be strongly reinforced. There are some good (well-defined and well-established) domain specific ontology specifications out there, e.g. the Music Ontology (for the music domain), which should also be used instead of using DBPedia's own concept and property definitions there. I know one could now also say that we could apply ontology mapping/alignment here. However, that would blow up the whole knowledge base (with obsolete mappings) and it would slow down the reasoning process over it. I also know that everyone is free to say everything about everything. Although, I think it expresses a big redundancy, if we define the same concepts and properties over and over again and use for the explanation their meaning the same definitions. If we would like a huge distributed database in the Web, then we should at least agree to some important 'best practice' patterns (ontology reutilization is one of them) to establish a good interlinking between single datasets.


Cheers,


Bob

Reply via email to