Hi Pablo. Jona, and all,

On 04/06/2013 01:56 PM, Pablo N. Mendes wrote:

I'd say this topic can safely move out of dbpedia-discussion and to dbpedia-developers now. :)

I agree with Jona. With one small detail: perhaps it is better we don't to load everything in memory, if we use a fast database such as Berkeley DB or JDBM3. They would also allow you to use in-memory when you can splunge or use disk-backed when restricted. What do you think?

I agree with Pablo's idea, as it will work in both dump and live modes.
Actually, for live extraction we already need a lot of memory, as we have a running Virtuoso instance that should be updated by the framework, and we have a local mirror of Wikipedia as which used MySQL as back-end storage.
So, I would prefer saving as much memory as possible.


Cheers,
Pablo


On Fri, Apr 5, 2013 at 10:01 PM, Jona Christopher Sahnwaldt <j...@sahnwaldt.de <mailto:j...@sahnwaldt.de>> wrote:

    On 5 April 2013 21:27, Andrea Di Menna <ninn...@gmail.com
    <mailto:ninn...@gmail.com>> wrote:
    > Hi Dimitris,
    >
    > I am not completely getting your point.
    >
    > How would you handle the following example? (supposing the
    following will be
    > possible with Wikipedia/Wikidata)
    >
    > Suppose you have
    >
    > {{Infobox:Test
    > | name = {{#property:p45}}
    > }}
    >
    > and a mapping
    >
    > {{PropertyMapping | templateProperty = name | ontologyProperty =
    foaf:name}}
    >
    > what would happen when running the MappingExtractor?
    > Which RDF triples would be generated?

    I think there are two questions here, and two very different
    approaches.

    1. In the near term, I would expect that Wikipedia templates are
    modified like in your example.

    How could/should DBpedia deal with this? The simplest solution seems
    to be that during a preliminary step, we extract data from Wikidata
    and store it. During the main extraction, whenever we find a reference
    to Wikidata, we look it up and generate a triple as usual. Not a huge
    change.

    2. In the long run though, when all data is moved to Wikidata, all
    instances of a certain infobox type will look the same. It doesn't
    matter anymore if an infobox is about Germany or Italy, because they
    all use the same properties:

    {{Infobox country
    | capitol = {{#property:p45}}
    | population = {{#property:p42}}
    ... etc. ...
    }}

    I guess Wikidata already thought of this and has plans to then replace
    the whole infobox by a small construct that simply instructs MediaWiki
    to pull all data for this item from Wikidata and display an infobox.
    In this case, there will be nothing left to extract for DBpedia.

    Implementation detail: we shouldn't use a SPARQL store to look up
    Wikidata data, we should keep them in memory. A SPARQL call will
    certainly be at least 100 times slower than a lookup in a map, but
    probably 10000 times or more. This matters because there will be
    hundreds of millions of lookup calls during an extraction. Keeping all
    inter-language links in memory takes about 4 or 5 GB - not much. Of
    course, keeping all Wikidata data in memory would take between 10 and
    100 times as much RAM.

    Cheers,
    JC

    >
    > Cheers
    > Andrea
    >
    >
    > 2013/4/5 Dimitris Kontokostas <jimk...@gmail.com
    <mailto:jimk...@gmail.com>>
    >>
    >> Hi,
    >>
    >> For me there is no reason to complicate the DBpedia framework
    by resolving
    >> Wikidata data / templates.
    >> What we could do is (try to) provide a semantic mirror of
    Wikidata in i.e.
    >> data.dbpedia.org <http://data.dbpedia.org>. We should simplify
    it by mapping the data to the DBpedia
    >> ontology and then use it like any other language edition we
    have (e.g.
    >> nl.dbpedia.org <http://nl.dbpedia.org>).
    >>
    >> In dbpedia.org <http://dbpedia.org> we already aggregate data
    from other language editions. For
    >> now it is mostly labels & abstracts but we can also fuse
    Wikidata data. This
    >> way, whatever is missing from the Wikipedia dumps will be
    filled in the end
    >> by the Wikidata dumps
    >>
    >> Best,
    >> Dimitris
    >>
    >>
    >> On Fri, Apr 5, 2013 at 9:49 PM, Julien Plu
    >> <julien....@redaction-developpez.com
    <mailto:julien....@redaction-developpez.com>> wrote:
    >>>
    >>> Ok, thanks for the precision :-) It's perfect, now just
    waiting when the
    >>> dump of these data will be available.
    >>>
    >>> Best.
    >>>
    >>> Julien Plu.
    >>>
    >>>
    >>> 2013/4/5 Jona Christopher Sahnwaldt <j...@sahnwaldt.de
    <mailto:j...@sahnwaldt.de>>
    >>>>
    >>>> On 5 April 2013 19:59, Julien Plu
    <julien....@redaction-developpez.com
    <mailto:julien....@redaction-developpez.com>>
    >>>> wrote:
    >>>> > Hi,
    >>>> >
    >>>> > @Anja : Have you a post from a blog or something like that
    which
    >>>> > speaking
    >>>> > about RDF dump of wikidata ?
    >>>>
    >>>> http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
    >>>>
    >>>> @Anja: do you know when RDF dumps are planned to be available?
    >>>>
    >>>> > The french wikidata will also provide their
    >>>> > data in RDF ?
    >>>>
    >>>> There is only one Wikidata - neither English nor French nor
    any other
    >>>> language. It's just data. There are labels in different
    languages, but
    >>>> the data itself is language-agnostic.
    >>>>
    >>>> >
    >>>> > This news interest me very highly.
    >>>> >
    >>>> > Best
    >>>> >
    >>>> > Julien Plu.
    >>>> >
    >>>> >
    >>>> > 2013/4/5 Tom Morris <tfmor...@gmail.com
    <mailto:tfmor...@gmail.com>>
    >>>> >>
    >>>> >> On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher Sahnwaldt
    >>>> >> <j...@sahnwaldt.de <mailto:j...@sahnwaldt.de>> wrote:
    >>>> >>>
    >>>> >>>
    >>>> >>> thanks for the heads-up!
    >>>> >>>
    >>>> >>> On 5 April 2013 10:44, Julien Plu
    >>>> >>> <julien....@redaction-developpez.com
    <mailto:julien....@redaction-developpez.com>>
    >>>> >>> wrote:
    >>>> >>> > Hi,
    >>>> >>> >
    >>>> >>> > I saw few days ago that MediaWiki since one month allow
    to create
    >>>> >>> > infoboxes
    >>>> >>> > (or part of them) with Lua scripting language.
    >>>> >>> > http://www.mediawiki.org/wiki/Lua_scripting
    >>>> >>> >
    >>>> >>> > So my question is, if every data in the wikipedia
    infoboxes are in
    >>>> >>> > Lua
    >>>> >>> > scripts, DBPedia will still be able to retrieve all the
    data as
    >>>> >>> > usual ?
    >>>> >>>
    >>>> >>> I'm not 100% sure, and we should look into this, but I
    think that
    >>>> >>> Lua
    >>>> >>> is only used in template definitions, not in template
    calls or other
    >>>> >>> places in content pages. DBpedia does not parse template
    >>>> >>> definitions,
    >>>> >>> only content pages. The content pages probably will only
    change in
    >>>> >>> minor ways, if at all. For example, {{Foo}} might change to
    >>>> >>> {{#invoke:Foo}}. But that's just my preliminary
    understanding after
    >>>> >>> looking through a few tuorial pages.
    >>>> >>
    >>>> >>
    >>>> >> As far as I can see, the template calls are unchanged for
    all the
    >>>> >> templates which makes sense when you consider that some of the
    >>>> >> templates
    >>>> >> that they've upgraded to use Lua like Template:Coord  are
    used on
    >>>> >> almost a
    >>>> >> million pages.
    >>>> >>
    >>>> >> Here are the ones which have been updated so far:
    >>>> >> https://en.wikipedia.org/wiki/Category:Lua-based_templates
    >>>> >> Performance improvement looks impressive:
    >>>> >>
    https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance
    >>>> >>
    >>>> >> Tom
    >>>> >
    >>>> >
    >>>
    >>>
    >>>
    >>>
    >>>
    
------------------------------------------------------------------------------
    >>> Minimize network downtime and maximize team effectiveness.
    >>> Reduce network management and security costs.Learn how to hire
    >>> the most talented Cisco Certified professionals. Visit the
    >>> Employer Resources Portal
    >>> http://www.cisco.com/web/learning/employer_resources/index.html
    >>> _______________________________________________
    >>> Dbpedia-discussion mailing list
    >>> Dbpedia-discussion@lists.sourceforge.net
    <mailto:Dbpedia-discussion@lists.sourceforge.net>
    >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
    >>>
    >>
    >>
    >>
    >> --
    >> Kontokostas Dimitris
    >>
    >>
    >>
    
------------------------------------------------------------------------------
    >> Minimize network downtime and maximize team effectiveness.
    >> Reduce network management and security costs.Learn how to hire
    >> the most talented Cisco Certified professionals. Visit the
    >> Employer Resources Portal
    >> http://www.cisco.com/web/learning/employer_resources/index.html
    >> _______________________________________________
    >> Dbpedia-discussion mailing list
    >> Dbpedia-discussion@lists.sourceforge.net
    <mailto:Dbpedia-discussion@lists.sourceforge.net>
    >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
    >>
    >
    >
    >
    
------------------------------------------------------------------------------
    > Minimize network downtime and maximize team effectiveness.
    > Reduce network management and security costs.Learn how to hire
    > the most talented Cisco Certified professionals. Visit the
    > Employer Resources Portal
    > http://www.cisco.com/web/learning/employer_resources/index.html
    > _______________________________________________
    > Dbpedia-discussion mailing list
    > Dbpedia-discussion@lists.sourceforge.net
    <mailto:Dbpedia-discussion@lists.sourceforge.net>
    > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
    >

    
------------------------------------------------------------------------------
    Minimize network downtime and maximize team effectiveness.
    Reduce network management and security costs.Learn how to hire
    the most talented Cisco Certified professionals. Visit the
    Employer Resources Portal
    http://www.cisco.com/web/learning/employer_resources/index.html
    _______________________________________________
    Dbpedia-discussion mailing list
    Dbpedia-discussion@lists.sourceforge.net
    <mailto:Dbpedia-discussion@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




--

Pablo N. Mendes
http://pablomendes.com


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html


_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--
Kind Regards
Mohamed Morsey
Department of Computer Science
University of Leipzig

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to