Hi Paul,
thanks a lot for your very insightful experience report about Semantic
Web, RDF and DBPedia.
(more thoughts inline)
Am 02.07.2010 17:07, schrieb Paul Houle:
Here are some of my thoughts
[skip]
(4) I'm one of the people who got interested in semantic tech because of
DBPedia, but yet, I've also largely given up on DBPedia. One day I
realized that I could, with Freebase, do things in 20 minutes that
would take 2 weeks of data cleanup with DBPedia. DBPedia 3.5/3.5.1
seems to be a large step backwards, with major key integrity problems
that are completely invisible to 'open world' and OWL-paradigm systems.
I've wound up writing my own framework for extracting 'facts' from
wikipedia because DBPedia isn't interested in extracting the things I
want. Every time I try to do something with DBpedia, I make shocking
discoveries (for instance, "New York City", "Berlin", "Tokyo",
"Washington , D.C." and "Manchester, N.H." are not of rdf:type "City")
The fact that I see so little complaining about this on the mailing
list seems to indicate that not a lot of people are trying to do real
work it.
I ask me all the time, why DBPedia (and now also Uberblic) uses its own
(very huge) ontology specification in the background. Of course, they
sometimes re-use some pieces of (well-established) ontology
specifications. However, I think this pattern should be strongly
reinforced. There are some good (well-defined and well-established)
domain specific ontology specifications out there, e.g. the Music
Ontology (for the music domain), which should also be used instead of
using DBPedia's own concept and property definitions there.
I know one could now also say that we could apply ontology
mapping/alignment here. However, that would blow up the whole knowledge
base (with obsolete mappings) and it would slow down the reasoning
process over it. I also know that everyone is free to say everything
about everything. Although, I think it expresses a big redundancy, if we
define the same concepts and properties over and over again and use for
the explanation their meaning the same definitions.
If we would like a huge distributed database in the Web, then we should
at least agree to some important 'best practice' patterns (ontology
reutilization is one of them) to establish a good interlinking between
single datasets.
Cheers,
Bob