The WikiData project is great, but this article seems to be saying "the raw infobox data is dirty, oh no!" and ignoring the consistent Infobox ontology that's already taken care of lots of the problems that it [the article] mentions. E.g., the query
select ?c COUNT(*) AS ?cnt { ?s dbpprop:country ?c . } GROUP BY(?c) ORDER BY DESC(?cnt) LIMIT 30 does highlight the dirtiness in the raw data [1]. However, using the corresponding infobox ontology property, dbpedia-owl:country, we get much cleaner results: select ?c (count(*) as ?cnt) { ?s dbpedia-owl:country ?c . } GROUP BY(?c) ORDER BY DESC(?cnt) LIMIT 30 This returns only OWL individuals, and they all have type country. This means that you don't need to, as the article suggests, rewrite the query as: select ?s { ?s dbpprop:country ?c . FILTER(?c IN (:United_States,"United States"@en,"USA"@en,...) } because the other alternative mentioned, i.e.: >>> One strategy is to apply a cleaning process to a database before we run queries. We clean up the data first, load it into a database, and then do queries. We'd deal with the multiple names by rewriting "United States"@en and all the other variants to :United_States. <<< has already been done in the infobox ontology. I think that WikiData's a great project and may very well be the right way to "do it right from the beginning." That said, it seems a bit disingenuous to ignore the higher quality data that's already available in DBpedia in the infobox ontology. //JT [1] As an aside, it's also not a legal SPARQL query, although the endpoint accepts it. It should be "select ?c (COUNT(*) as ?cnt) { …" with parentheses around the count as. You can validate it with sparql.org's query validator. On Mon, Jul 14, 2014 at 6:49 PM, Paul Houle <ontolo...@gmail.com> wrote: > http://blog.databaseanimals.com/the-trouble-with-dbpedia > > -- > Paul Houle > Expert on Freebase, DBpedia, Hadoop and RDF > (607) 539 6254 paul.houle on Skype ontolo...@gmail.com > ᐧ > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion -- Joshua Taylor, http://www.cs.rpi.edu/~tayloj/ ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion