I guess Pablo is right...

I was more concerned on the live issue I reported
so I gave a more technical answer and not through the Wikipedia perspective
:)

The data can be pretty messy and if you take local DBpedia's into account
the problem is amplified.
Apart from fixing the parsers, we could also create tools to find such
errors and point them to Wikipedia editors

Dimitris

On Mon, Nov 7, 2011 at 10:36 PM, Pablo Mendes <pablomen...@gmail.com> wrote:

>
> Hi all,
> First thing, thanks to Zsíros for pointing out the error, to the DBpedia
> co-founder Sören for his quick response - can we assign bugs to you too? :P
> - and to our i18n pioneer Dimitris for looking deeper into the issue.
>
> Dimitris has a point there. That is not a valid number. However, maybe we
> shouldn't say that there is no problem with the parser.
>
> I tried the query below on http://dbpedia.org/sparql
>
> select ?outlier ?pop
> where {{
> ?outlier a dbpedia-owl:PopulatedPlace .
> ?outlier dbpprop:populationTotal ?pop .
> FILTER regex(?pop, "[^0-9]+[0-9]+")
> }
> union
> {
> ?outlier a dbpedia-owl:PopulatedPlace .
> ?outlier dbpprop:populationTotal ?pop .
> FILTER regex(?pop, "[0-9]+[^0-9]+")
> }}
>
>
> It returns more than 1000 results where there are characters within what
> should be a numeric field. This exemplifies the (messy)  nature of the data
> we're dealing with. It should also give you an idea of how hard it is to
> get the parsing right:
>
> http://dbpedia.org/resource/Harlow
> http://dbpedia.org/resource/List_of_English_districts_by_population
> http://dbpedia.org/resource/Beetgum"c. 735"@en
> http://dbpedia.org/resource/Conakry"6.9522624E9"^^<
> http://dbpedia.org/datatype/second>
> http://dbpedia.org/resource/Varadero "aprox. 20000"@en
>
> For many of these resources, the property is simply not there.
>
> The parser does already a great job, but it has much to improve. These
> problems are super tough to solve in a generic way. But our research is
> progressing in that direction, and between the groups active in this
> community I'm sure many good solutions will pop up.
>
> Best,
> Pablo
>
>
> On Mon, Nov 7, 2011 at 8:13 PM, Dimitris Kontokostas <jimk...@gmail.com>wrote:
>
>> Hi,
>>
>> There is no problem with the parser, the number has a space between 74
>> and 544 (74*_*544), so it is not a valid number.
>> the dbpedia-owl tries to validate the value to a number so it gets 74
>> the dbpprop does not validate values and accepts them as text (74 544)
>>
>> However, I spotted a problem with the live extraction.
>> I changed the article to the correct number but the following happened:
>> 1) the 
>> dbpprop:populationTotal<http://live.dbpedia.org/property/populationTotal>kept
>>  both the new and previous value (
>> 74544  & 74 544)
>> 2) the 
>> dbpedia-owl:populationTotal<http://live.dbpedia.org/ontology/populationTotal>
>>  remained
>> 74 (it did not change to 74544)
>>
>> (http://live.dbpedia.org/page/Szolnok)
>>
>> Cheers,
>> Dimitris
>>
>>
>> On Mon, Nov 7, 2011 at 2:43 PM, Sören Auer <
>> a...@informatik.uni-leipzig.de> wrote:
>>
>>> Dear Zsíros,
>>>
>>> Indeed this seems to be a parser problem. Interestingly
>>> http://live.dbpedia.org/page/Szolnok has at least a better value for
>>> dbpprop:populationTotal (74 544).
>>> I'm CCing the DBpedia mailinglist, since there might be people able to
>>> help there. I will also discuss this issue with my colleagues working
>>> here on DBpedia.
>>>
>>> Sören
>>>
>>>
>>> Am 07.11.2011 13:35, schrieb Levente Zsíros:
>>> > Hello!
>>> > The city Szolnok ( http://en.wikipedia.org/wiki/Szolnok ) in DBpedia
>>> has
>>> > a population 74, which is wrong. On the other hand Wikipedia has the
>>> > correct value. Isn't DBpedia supposed to be in sync with Wikipedia? Or
>>> > is your wiki-parser faulty?
>>> >
>>> > http://dbpedia.org/page/Szolnok     dbpprop:populationTotal
>>> > <http://dbpedia.org/property/populationTotal>
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Zsíros Levente
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> RSA(R) Conference 2012
>>> Save $700 by Nov 18
>>> Register now
>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Dbpedia-developers mailing list
>> dbpedia-develop...@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>>
>


-- 
Kontokostas Dimitris
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to