Hello!

These kinds of errors are most certainly to be eliminated in the editor.
Actually in Sztakipedia we have something to edit infoboxes:
http://www.youtube.com/watch?v=4M0tIIVwm4c (jump to 6:00 !)  This is a part
of a Rich Text editor for mediawiki we have abandoned, to be able to work
on the toolbar. But this part could be re-used in the toolbar too. I will
look into it as my schedule permits.

Why have we abandoned the RTE we have developed for months you might ask.
The reason of this is that as I have learned in June, the foundation has
made a decision to develop a new parser, and a new, Visual Editor:
http://www.mediawiki.org/wiki/Visual_Editor_design. (of course you might
already know this) The parser development involves changes in the syntax
to, in order to get some kind of grammar at last. Also, an immediate
structure, Wiki Object Model (WOM) is being developed, which, in theory
enables programmatic access to the elements of the article and I hope we
can forget wikitext.
For me it is a bit hard to see when (and if) that project will be finished
but it will be a total game changer for both Sztakipedia and DBpedia I
think. In that interface they easily could implement constraints to fields.
Personally my plan is to make SzP one of the first plug-ins in that editor.

Ok, so what I really wanted add is that we better keep an eye on the new
parser and editor they are working on, before solving everything in the old
system.

Oh, and one small thing: Zsíros -> surname Levente -> given name. So you
probably want to call him Levente instead of Zsíros ;)
(but that is just the classic problem of the different name order of
Hungarian)

Best to all
Mihály

On 8 November 2011 19:38, Pablo Mendes <pablomen...@gmail.com> wrote:

> Dimitris,
>
>
>> Apart from fixing the parsers, we could also create tools to find such
>> errors and point them to Wikipedia editors
>
>
> Nice. That is also a great idea! Mihály could probably keep this on his
> Sztakipedia radar, if he doesn't already. :) Of course for some cases they
> are not really mistakes, but rather an indication of imprecision (c. 370000
> is not the same as 370000). But I think for many fields it would be
> possible to do some kind of validation, or suggestion to homogenize the
> values. In other cases, we can try some magic in the server side.
>
> Cheers,
> Pablo
>
> On Tue, Nov 8, 2011 at 7:14 PM, Dimitris Kontokostas <jimk...@gmail.com>wrote:
>
>> I guess Pablo is right...
>>
>> I was more concerned on the live issue I reported
>> so I gave a more technical answer and not through the Wikipedia
>> perspective :)
>>
>> The data can be pretty messy and if you take local DBpedia's into account
>> the problem is amplified.
>> Apart from fixing the parsers, we could also create tools to find such
>> errors and point them to Wikipedia editors
>>
>> Dimitris
>>
>>
>> On Mon, Nov 7, 2011 at 10:36 PM, Pablo Mendes <pablomen...@gmail.com>wrote:
>>
>>>
>>> Hi all,
>>> First thing, thanks to Zsíros for pointing out the error, to the DBpedia
>>> co-founder Sören for his quick response - can we assign bugs to you too? :P
>>> - and to our i18n pioneer Dimitris for looking deeper into the issue.
>>>
>>> Dimitris has a point there. That is not a valid number. However, maybe
>>> we shouldn't say that there is no problem with the parser.
>>>
>>> I tried the query below on http://dbpedia.org/sparql
>>>
>>> select ?outlier ?pop
>>> where {{
>>> ?outlier a dbpedia-owl:PopulatedPlace .
>>> ?outlier dbpprop:populationTotal ?pop .
>>> FILTER regex(?pop, "[^0-9]+[0-9]+")
>>> }
>>> union
>>> {
>>> ?outlier a dbpedia-owl:PopulatedPlace .
>>> ?outlier dbpprop:populationTotal ?pop .
>>> FILTER regex(?pop, "[0-9]+[^0-9]+")
>>> }}
>>>
>>>
>>> It returns more than 1000 results where there are characters within what
>>> should be a numeric field. This exemplifies the (messy)  nature of the data
>>> we're dealing with. It should also give you an idea of how hard it is to
>>> get the parsing right:
>>>
>>> http://dbpedia.org/resource/Harlow
>>> http://dbpedia.org/resource/List_of_English_districts_by_population
>>> http://dbpedia.org/resource/Beetgum"c. 735"@en
>>> http://dbpedia.org/resource/Conakry"6.9522624E9"^^<
>>> http://dbpedia.org/datatype/second>
>>> http://dbpedia.org/resource/Varadero "aprox. 20000"@en
>>>
>>> For many of these resources, the property is simply not there.
>>>
>>> The parser does already a great job, but it has much to improve. These
>>> problems are super tough to solve in a generic way. But our research is
>>> progressing in that direction, and between the groups active in this
>>> community I'm sure many good solutions will pop up.
>>>
>>> Best,
>>> Pablo
>>>
>>>
>>> On Mon, Nov 7, 2011 at 8:13 PM, Dimitris Kontokostas 
>>> <jimk...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> There is no problem with the parser, the number has a space between 74
>>>> and 544 (74*_*544), so it is not a valid number.
>>>> the dbpedia-owl tries to validate the value to a number so it gets 74
>>>> the dbpprop does not validate values and accepts them as text (74 544)
>>>>
>>>> However, I spotted a problem with the live extraction.
>>>> I changed the article to the correct number but the following happened:
>>>> 1) the 
>>>> dbpprop:populationTotal<http://live.dbpedia.org/property/populationTotal>kept
>>>>  both the new and previous value (
>>>> 74544  & 74 544)
>>>> 2) the 
>>>> dbpedia-owl:populationTotal<http://live.dbpedia.org/ontology/populationTotal>
>>>>  remained
>>>> 74 (it did not change to 74544)
>>>>
>>>> (http://live.dbpedia.org/page/Szolnok)
>>>>
>>>> Cheers,
>>>> Dimitris
>>>>
>>>>
>>>> On Mon, Nov 7, 2011 at 2:43 PM, Sören Auer <
>>>> a...@informatik.uni-leipzig.de> wrote:
>>>>
>>>>> Dear Zsíros,
>>>>>
>>>>> Indeed this seems to be a parser problem. Interestingly
>>>>> http://live.dbpedia.org/page/Szolnok has at least a better value for
>>>>> dbpprop:populationTotal (74 544).
>>>>> I'm CCing the DBpedia mailinglist, since there might be people able to
>>>>> help there. I will also discuss this issue with my colleagues working
>>>>> here on DBpedia.
>>>>>
>>>>> Sören
>>>>>
>>>>>
>>>>> Am 07.11.2011 13:35, schrieb Levente Zsíros:
>>>>> > Hello!
>>>>> > The city Szolnok ( http://en.wikipedia.org/wiki/Szolnok ) in
>>>>> DBpedia has
>>>>> > a population 74, which is wrong. On the other hand Wikipedia has the
>>>>> > correct value. Isn't DBpedia supposed to be in sync with Wikipedia?
>>>>> Or
>>>>> > is your wiki-parser faulty?
>>>>> >
>>>>> > http://dbpedia.org/page/Szolnok     dbpprop:populationTotal
>>>>> > <http://dbpedia.org/property/populationTotal>
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> > Zsíros Levente
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> RSA(R) Conference 2012
>>>>> Save $700 by Nov 18
>>>>> Register now
>>>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>>>> _______________________________________________
>>>>> Dbpedia-discussion mailing list
>>>>> Dbpedia-discussion@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kontokostas Dimitris
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> RSA(R) Conference 2012
>>>> Save $700 by Nov 18
>>>> Register now
>>>> http://p.sf.net/sfu/rsa-sfdev2dev1
>>>> _______________________________________________
>>>> Dbpedia-developers mailing list
>>>> dbpedia-develop...@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>>
>>>>
>>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>
>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to