Hi Julien,

thanks for the heads-up!

On 5 April 2013 10:44, Julien Plu <julien....@redaction-developpez.com> wrote:
> Hi,
>
> I saw few days ago that MediaWiki since one month allow to create infoboxes
> (or part of them) with Lua scripting language.
> http://www.mediawiki.org/wiki/Lua_scripting
>
> So my question is, if every data in the wikipedia infoboxes are in Lua
> scripts, DBPedia will still be able to retrieve all the data as usual ?

I'm not 100% sure, and we should look into this, but I think that Lua
is only used in template definitions, not in template calls or other
places in content pages. DBpedia does not parse template definitions,
only content pages. The content pages probably will only change in
minor ways, if at all. For example, {{Foo}} might change to
{{#invoke:Foo}}. But that's just my preliminary understanding after
looking through a few tuorial pages.

>
> My other question is mainly concerned by Wikipedia FR, because I don't found
> the same thing in english, sorry. Since almost one year for the infobox
> population property we can do something like that :
>
> population = {{population number}}
>
> Where "population number" refer to a number which is on another page. Let me
> give you an example, the Wikipedia page about Toulouse city, contain this
> infobox property :
>
> | population         = {{Dernière population commune de France}}
>
> And the value of "Dernière population commune de France" is contained in
> this wikipedia page :
> http://fr.wikipedia.org/wiki/Mod%C3%A8le:Donn%C3%A9es/Toulouse/%C3%A9volution_population
>
> So now the problem is that in the xml dump we don't have the real value of
> the population so it exist a way to have the value and not the "string"
> which represent the value ?

I've seen similar structures on Wikipedia de [1] and I think also on
pl or cs: the actual data is not in the content pages, but in some
template, and is rendered on the content page by rather complex
mechanisms.

To deal with this, DBpedia could try to expand templates, or maybe
just certain templates (we don't want all the HTML stuff). Great
generality, but may cause perfomance and other problems. In the worst
case, mapping-based extraction could become as slow as abstract
extraction.

Or we could let people add rules on the mappings wiki about which
templates contain data and how the data should be attached to certain
DBpedia resources. Of course, determining syntax and semantics for
such rules wouldn't be trivial...

...but if we get there, we could implement the data extraction as a
preprocessing step: in a first extraction phase, go through the
Wikipedia dump, collect and store stuff from these 'data templates',
and during the main extraction, pull the data from the store where
needed and generate triples. Informally, we already have such a
preprocessing phase for the redirects. It would make sense to
"formalize" it and also use it for other info, e.g. disambiguation
pages, inter-language links, resource types, etc.

Cheers,
JC

[1] For example, http://de.wikipedia.org/wiki/Hannover contains

{{Infobox Gemeinde in Deutschland
...
| Gemeindeschlüssel = 03241001
...
}}

("Gemeinde in Deutschland" means "community in Germany",
"Gemeindeschlüssel" means "community key".)

The actual data is in pages like

http://de.wikipedia.org/wiki/Vorlage:Metadaten_Einwohnerzahl_DE-NI

>
> I hope that I was enough clear, otherwise don't hesitate to ask me some
> informations in more about these problems.
>
> Thanks for your lights.
>
> Best regards.
>
> Julien Plu.
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to