The WikiData project is great, but this article seems to be saying
"the raw infobox data is dirty, oh no!" and ignoring the consistent
Infobox ontology that's already taken care of lots of the problems
that it [the article] mentions.  E.g., the query

select ?c COUNT(*) AS ?cnt {
   ?s dbpprop:country ?c .
}
GROUP BY(?c)
ORDER BY DESC(?cnt)
LIMIT 30

does highlight the dirtiness in the raw data [1].  However, using the
corresponding infobox ontology property, dbpedia-owl:country, we get
much cleaner results:

select ?c (count(*) as ?cnt) {
  ?s dbpedia-owl:country ?c .
}
GROUP BY(?c)
ORDER BY DESC(?cnt)
LIMIT 30

This returns only OWL individuals, and they all have type country.
This means that you don't need to, as the article suggests, rewrite
the query as:

select ?s {
   ?s dbpprop:country ?c .
  FILTER(?c IN (:United_States,"United States"@en,"USA"@en,...)
}

because the other alternative mentioned, i.e.:

>>>
One strategy is to apply a cleaning process to a database before we
run queries. We clean up the data first, load it into a database, and
then do queries. We'd deal with the multiple names by rewriting
"United States"@en and all the other variants to :United_States.
<<<

has already been done in the infobox ontology. I think that WikiData's
a great project and may very well be the right way to "do it right
from the beginning."  That said, it seems a bit disingenuous to ignore
the higher quality data that's already available in DBpedia in the
infobox ontology.

//JT

[1]  As an aside, it's also not a legal SPARQL query, although the
endpoint accepts it.  It should be "select ?c (COUNT(*) as ?cnt) { …"
with parentheses around the count as.  You can validate it with
sparql.org's query validator.

On Mon, Jul 14, 2014 at 6:49 PM, Paul Houle <ontolo...@gmail.com> wrote:
> http://blog.databaseanimals.com/the-trouble-with-dbpedia
>
> --
> Paul Houle
> Expert on Freebase, DBpedia, Hadoop and RDF
> (607) 539 6254    paul.houle on Skype   ontolo...@gmail.com
> ᐧ
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to