Hi Pablo. Jona, and all,
On 04/06/2013 01:56 PM, Pablo N. Mendes wrote:
I'd say this topic can safely move out of dbpedia-discussion and to
dbpedia-developers now. :)
I agree with Jona. With one small detail: perhaps it is better we
don't to load everything in memory, if we use a fast database such as
Berkeley DB or JDBM3. They would also allow you to use in-memory when
you can splunge or use disk-backed when restricted. What do you think?
I agree with Pablo's idea, as it will work in both dump and live modes.
Actually, for live extraction we already need a lot of memory, as we
have a running Virtuoso instance that should be updated by the
framework, and we have a local mirror of Wikipedia as which used MySQL
as back-end storage.
So, I would prefer saving as much memory as possible.
Cheers,
Pablo
On Fri, Apr 5, 2013 at 10:01 PM, Jona Christopher Sahnwaldt
<j...@sahnwaldt.de <mailto:j...@sahnwaldt.de>> wrote:
On 5 April 2013 21:27, Andrea Di Menna <ninn...@gmail.com
<mailto:ninn...@gmail.com>> wrote:
> Hi Dimitris,
>
> I am not completely getting your point.
>
> How would you handle the following example? (supposing the
following will be
> possible with Wikipedia/Wikidata)
>
> Suppose you have
>
> {{Infobox:Test
> | name = {{#property:p45}}
> }}
>
> and a mapping
>
> {{PropertyMapping | templateProperty = name | ontologyProperty =
foaf:name}}
>
> what would happen when running the MappingExtractor?
> Which RDF triples would be generated?
I think there are two questions here, and two very different
approaches.
1. In the near term, I would expect that Wikipedia templates are
modified like in your example.
How could/should DBpedia deal with this? The simplest solution seems
to be that during a preliminary step, we extract data from Wikidata
and store it. During the main extraction, whenever we find a reference
to Wikidata, we look it up and generate a triple as usual. Not a huge
change.
2. In the long run though, when all data is moved to Wikidata, all
instances of a certain infobox type will look the same. It doesn't
matter anymore if an infobox is about Germany or Italy, because they
all use the same properties:
{{Infobox country
| capitol = {{#property:p45}}
| population = {{#property:p42}}
... etc. ...
}}
I guess Wikidata already thought of this and has plans to then replace
the whole infobox by a small construct that simply instructs MediaWiki
to pull all data for this item from Wikidata and display an infobox.
In this case, there will be nothing left to extract for DBpedia.
Implementation detail: we shouldn't use a SPARQL store to look up
Wikidata data, we should keep them in memory. A SPARQL call will
certainly be at least 100 times slower than a lookup in a map, but
probably 10000 times or more. This matters because there will be
hundreds of millions of lookup calls during an extraction. Keeping all
inter-language links in memory takes about 4 or 5 GB - not much. Of
course, keeping all Wikidata data in memory would take between 10 and
100 times as much RAM.
Cheers,
JC
>
> Cheers
> Andrea
>
>
> 2013/4/5 Dimitris Kontokostas <jimk...@gmail.com
<mailto:jimk...@gmail.com>>
>>
>> Hi,
>>
>> For me there is no reason to complicate the DBpedia framework
by resolving
>> Wikidata data / templates.
>> What we could do is (try to) provide a semantic mirror of
Wikidata in i.e.
>> data.dbpedia.org <http://data.dbpedia.org>. We should simplify
it by mapping the data to the DBpedia
>> ontology and then use it like any other language edition we
have (e.g.
>> nl.dbpedia.org <http://nl.dbpedia.org>).
>>
>> In dbpedia.org <http://dbpedia.org> we already aggregate data
from other language editions. For
>> now it is mostly labels & abstracts but we can also fuse
Wikidata data. This
>> way, whatever is missing from the Wikipedia dumps will be
filled in the end
>> by the Wikidata dumps
>>
>> Best,
>> Dimitris
>>
>>
>> On Fri, Apr 5, 2013 at 9:49 PM, Julien Plu
>> <julien....@redaction-developpez.com
<mailto:julien....@redaction-developpez.com>> wrote:
>>>
>>> Ok, thanks for the precision :-) It's perfect, now just
waiting when the
>>> dump of these data will be available.
>>>
>>> Best.
>>>
>>> Julien Plu.
>>>
>>>
>>> 2013/4/5 Jona Christopher Sahnwaldt <j...@sahnwaldt.de
<mailto:j...@sahnwaldt.de>>
>>>>
>>>> On 5 April 2013 19:59, Julien Plu
<julien....@redaction-developpez.com
<mailto:julien....@redaction-developpez.com>>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > @Anja : Have you a post from a blog or something like that
which
>>>> > speaking
>>>> > about RDF dump of wikidata ?
>>>>
>>>> http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
>>>>
>>>> @Anja: do you know when RDF dumps are planned to be available?
>>>>
>>>> > The french wikidata will also provide their
>>>> > data in RDF ?
>>>>
>>>> There is only one Wikidata - neither English nor French nor
any other
>>>> language. It's just data. There are labels in different
languages, but
>>>> the data itself is language-agnostic.
>>>>
>>>> >
>>>> > This news interest me very highly.
>>>> >
>>>> > Best
>>>> >
>>>> > Julien Plu.
>>>> >
>>>> >
>>>> > 2013/4/5 Tom Morris <tfmor...@gmail.com
<mailto:tfmor...@gmail.com>>
>>>> >>
>>>> >> On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher Sahnwaldt
>>>> >> <j...@sahnwaldt.de <mailto:j...@sahnwaldt.de>> wrote:
>>>> >>>
>>>> >>>
>>>> >>> thanks for the heads-up!
>>>> >>>
>>>> >>> On 5 April 2013 10:44, Julien Plu
>>>> >>> <julien....@redaction-developpez.com
<mailto:julien....@redaction-developpez.com>>
>>>> >>> wrote:
>>>> >>> > Hi,
>>>> >>> >
>>>> >>> > I saw few days ago that MediaWiki since one month allow
to create
>>>> >>> > infoboxes
>>>> >>> > (or part of them) with Lua scripting language.
>>>> >>> > http://www.mediawiki.org/wiki/Lua_scripting
>>>> >>> >
>>>> >>> > So my question is, if every data in the wikipedia
infoboxes are in
>>>> >>> > Lua
>>>> >>> > scripts, DBPedia will still be able to retrieve all the
data as
>>>> >>> > usual ?
>>>> >>>
>>>> >>> I'm not 100% sure, and we should look into this, but I
think that
>>>> >>> Lua
>>>> >>> is only used in template definitions, not in template
calls or other
>>>> >>> places in content pages. DBpedia does not parse template
>>>> >>> definitions,
>>>> >>> only content pages. The content pages probably will only
change in
>>>> >>> minor ways, if at all. For example, {{Foo}} might change to
>>>> >>> {{#invoke:Foo}}. But that's just my preliminary
understanding after
>>>> >>> looking through a few tuorial pages.
>>>> >>
>>>> >>
>>>> >> As far as I can see, the template calls are unchanged for
all the
>>>> >> templates which makes sense when you consider that some of the
>>>> >> templates
>>>> >> that they've upgraded to use Lua like Template:Coord are
used on
>>>> >> almost a
>>>> >> million pages.
>>>> >>
>>>> >> Here are the ones which have been updated so far:
>>>> >> https://en.wikipedia.org/wiki/Category:Lua-based_templates
>>>> >> Performance improvement looks impressive:
>>>> >>
https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance
>>>> >>
>>>> >> Tom
>>>> >
>>>> >
>>>
>>>
>>>
>>>
>>>
------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team effectiveness.
>>> Reduce network management and security costs.Learn how to hire
>>> the most talented Cisco Certified professionals. Visit the
>>> Employer Resources Portal
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>>
>>
------------------------------------------------------------------------------
>> Minimize network downtime and maximize team effectiveness.
>> Reduce network management and security costs.Learn how to hire
>> the most talented Cisco Certified professionals. Visit the
>> Employer Resources Portal
>> http://www.cisco.com/web/learning/employer_resources/index.html
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
>
------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
<mailto:Dbpedia-discussion@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Pablo N. Mendes
http://pablomendes.com
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Kind Regards
Mohamed Morsey
Department of Computer Science
University of Leipzig
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion