Re: [Dbpedia-developers] [Dbpedia-discussion] Wikidata

Jona Christopher Sahnwaldt Wed, 17 Apr 2013 07:54:57 -0700

On 17 April 2013 16:37, Sebastian Hellmann
<[email protected]> wrote:
> Am 17.04.2013 16:14, schrieb Jona Christopher Sahnwaldt:
>
>>
>>> By the way, where is that Wikidata ontology you are talking about? They
>>> have
>>> properties and categories, so you could say, that they are building a
>>> terminology.
>>
>> I don't know enough about ontologies etc. I thought the properties and
>> "classes" that Wikidata is introducing may be called an ontology, but
>> I don't know.
>>
>> It looks like there is a kind of subsumption hierarchy. For example,
>> 8329 Speckman [2] is an asteroid [3] is an astronomical object [4] is
>> a physical body [5].
>>
>> Cheers,
>> JC
>>
>> [2] http://wikidata.org/wiki/Special:ItemByTitle/enwiki/8329_Speckman
>> [3] http://wikidata.org/wiki/Special:ItemByTitle/enwiki/Asteroid
>> [4]
>> http://wikidata.org/wiki/Special:ItemByTitle/enwiki/Astronomical_object
>> [5] http://wikidata.org/wiki/Special:ItemByTitle/enwiki/Physical_body
>
>
> Well, the question is whether they stay at a SKOS level allowing circles and
> inconsistencies or whether they will have something similar to OWL.
> Otherwise you end up with the Wikipedia Category system and need something
> like Yago to clean it. See e.g.
> http://en.wikipedia.org/wiki/Category:Prime_Ministers_of_the_United_Kingdom
> Neither
> http://en.wikipedia.org/wiki/Book:Harold_Macmillan
> nor
> http://en.wikipedia.org/wiki/Timeline_of_Prime_Ministers_of_the_United_Kingdom
> nor
> http://en.wikipedia.org/wiki/Prime_Minister%27s_Spokesman
> are actually prime ministers,


>From what I read about Wikidata I have the impression that the people
behind it aim at creating high quality data and metadata. Just look at
the quantity and quality of the discussions at
http://www.wikidata.org/wiki/Wikidata:Property_proposal/all . Or look
at the "DIY data wikis" already existing on some Wikipedias, for
example 
http://fr.wikipedia.org/wiki/Modèle:Données/Toulouse/évolution_population
or
http://de.wikipedia.org/wiki/Vorlage:Metadaten_Einwohnerzahl_DE-NI ,
and then imagine the people who maintain these pages work on Wikidata
with a vengeance. :-) I don't think they will allow a messy structure
like Wikipedia categories on Wikidata. But of course, that's just a
gut feeling. We will see.


Cheers,
JC

>
> in the end, you might call all terminologies and taxonomies ontologies, of
> course, but they can be more a loose network with lots of inconsistencies.
>
> all the best,
> Sebastian
>
>
>>
>>> I would say, that we can finally concentrate on the knowledge
>>> modeling aspect.
>>>
>>> -- Sebastian
>>>
>>>
>>>
>>>
>>> Am 17.04.2013 11:51, schrieb Jona Christopher Sahnwaldt:
>>>
>>> Hi everyone,
>>>
>>> Parsing Wikidata JSON pages and generating RDF is the simple part. :-)
>>>
>>> The hard part is merging Wikidata and Wikipedia data.
>>>
>>> There are several hundred properties in DBpedia [1], and several
>>> hundred in Wikidata [2]. Mapping them all is quite a bit of effort.
>>>
>>> If we use the approach that looks up a Wikidata property value when it
>>> finds {{#property:P123}} in a Wikipedia page, we don't have to create
>>> a mapping from Wikidata to DBpedia - we will continue to use the
>>> existing mappings from Wikipedia languages to the DBpedia ontology.
>>> That's why I thought it might be easier to go that way.
>>>
>>> But in the long run, that 'lookup' approach won't work, because there
>>> won't even be stuff like {{#property:P123}} left in Wikipedia, just
>>> {{Infobox Foo}}, and the template definition will pull all the data
>>> from Wikidata. Of course, we don't know when this will happen.
>>>
>>> On the other hand, in the long run - what is a mapping from the
>>> Wikidata ontology to the DBpedia ontology good for? Wikidata has
>>> several orders of magnitudes more contributors than DBpedia. Just
>>> compare the recent changes:
>>> https://www.wikidata.org/wiki/Special:RecentChanges vs
>>> http://mappings.dbpedia.org/index.php/Special:RecentChanges . I think
>>> the Wikidata ontology will soon be larger and better than DBpedia.
>>> Maybe it would be better to adapt DBpedia to Wikidata, not the other
>>> way round. At some point, we should probably start to use the Wikidata
>>> ontology instead of our own ontology.
>>>
>>>
>>> JC
>>>
>>>
>>> [1]
>>>
>>> http://mappings.dbpedia.org/index.php?title=Special:AllPages&namespace=202
>>> [2] https://www.wikidata.org/wiki/Wikidata:List_of_properties
>>>
>>> On 13 April 2013 16:40, Dimitris Kontokostas <[email protected]> wrote:
>>>
>>> Cool then! I thought it was longer...
>>> Pablo you can stop tapping and come back from the corner now :)
>>>
>>> BTW, this is Sebastian Hellmann's favorite quote lately ;)
>>>
>>> http://researchinprogress.tumblr.com/post/33221709696/postdocs-writing-a-paper-so-who-will-implement-this
>>>
>>> Cheers,
>>> Dimitris
>>>
>>>
>>>
>>> On Sat, Apr 13, 2013 at 5:24 PM, Jona Christopher Sahnwaldt
>>> <[email protected]> wrote:
>>>
>>> On 13 April 2013 12:28, Dimitris Kontokostas <[email protected]> wrote:
>>>
>>> Hi Pablo,
>>>
>>> Normally I would agree with you but, under the circumstances it's a
>>> little
>>> more complicated.
>>> My main point is that we don't have someone like Jona working full time
>>> on
>>> the framework anymore so there is not enough time to do this right until
>>> the
>>> next release (1-2 months).
>>> Well, this is my estimation but Jona is the actual expert in the DIEF
>>> internals, so maybe he can make a better estimate on the effort :)
>>>
>>> Implementing the refactoring I proposed at [1] would take three days.
>>> Maybe two. Maybe one if we're quick and don't encounter problems that
>>> I forgot when I wrote that proposal. Maybe I forgot a lot of stuff, so
>>> to be very pessimistic, I'd say a week. :-)
>>>
>>> Once we have that in place, generating data from JSON pages is
>>> relatively simple, since the pages are well structured.
>>>
>>> Cheers,
>>> JC
>>>
>>> [1] https://github.com/dbpedia/extraction-framework/pull/35
>>>
>>>
>>> On the other hand, we are lucky enough to have external contributions
>>> this
>>> year (like Andrea's) but this is a process that takes much longer and we
>>> cannot guarantee that these contributions will be towards this goal.
>>>
>>> What I would suggest as a transition phase is to create the next DBpedia
>>> release now when Wikipedia data is not affected at all. Then wait a
>>> couple
>>> of months to see where this thing actually goes and get better prepared.
>>>
>>> Cheers,
>>> Dimitris
>>>
>>>
>>> On Sat, Apr 13, 2013 at 12:36 PM, Pablo N. Mendes
>>> <[email protected]>
>>> wrote:
>>>
>>> Hi Dimitris,
>>>
>>> Maybe the lookup approach will give us some improvement over our next
>>> release ...but in the following release (in 1+ year) everything will
>>> be
>>> completely different again.
>>> Trying to re-parse already structured data will end up in a very
>>> complicated design that we might end-up not using at all.
>>>
>>> Maybe I misunderstood this, but I was thinking of a very simple design
>>> here. You (and Jona) can estimate effort much better than me, due to my
>>> limited knowledge of the DEF internals.
>>>
>>> My suggestion was only to smoothen the transition. In a year or so,
>>> perhaps all of the data will be in WikiData, and we can just drop the
>>> markup
>>> parsing. But until that point, we need a hybrid solution. If I am
>>> seeing
>>> this right, the key-value store approach that was being discussed would
>>> allow us to bridge the gap between "completely wiki markup" and
>>> "completely
>>> wikidata".
>>>
>>> Once we don't need markup parsing anymore, we just make the switch,
>>> since
>>> we'd already have all of the machinery to connect to wikidata anyways
>>> (it is
>>> a requirement for the hybrid approach).
>>>
>>> Cheers,
>>> Pablo
>>>
>>>
>>>
>>>
>>> On Sat, Apr 13, 2013 at 10:20 AM, Jona Christopher Sahnwaldt
>>> <[email protected]> wrote:
>>>
>>> On 11 April 2013 13:47, Jona Christopher Sahnwaldt <[email protected]>
>>> wrote:
>>>
>>> All,
>>>
>>> I'd like to approach these decisions a bit more systematically.
>>>
>>> I'll try to list some of the most important open questions that come
>>> to mind regarding the development of DBpedia and Wikidata. I'll also
>>> add my own more or less speculative answers.
>>>
>>> I think we can't make good decisions about our way forward without
>>> clearly stating and answering these questions. We should ask the
>>> Wikidata people.
>>>
>>> @Anja: who should we ask at Wikidata? Just write to wikidata-l? Or
>>> is
>>> there a better way?
>>>
>>>
>>> 1. Will the Wikidata properties be messy (like Wikipedia) or clean
>>> (like DBpedia ontology)?
>>>
>>> My bet is that they will be clean.
>>>
>>> 2. When will Wikidata RDF dumps be available?
>>>
>>> I have no idea. Maybe two months, maybe two years.
>>>
>>> 3. When will data be *copied* from Wikipedia infoboxes (or other
>>> sources) to Wikidata?
>>>
>>> They're already starting. For example,
>>> wikidata/enwiki/Catherine_the_Great [1] has a lot of data.
>>>
>>> 4. When will data be *removed* from Wikipedia infoboxes?
>>>
>>> The inclusion syntax like {{#property:father}} doesn't work yet, so
>>> data cannot be removed. No idea when it will start. Maybe two
>>> months,
>>> maybe two years.
>>>
>>> This is starting sooner than I expected:
>>>
>>>
>>>
>>>
>>> http://meta.wikimedia.org/wiki/Wikidata/Deployment_Questions#When_will_this_be_deployed_on_my_Wikipedia.3F
>>>
>>> ----
>>>
>>> Phase 2 (infoboxes)
>>>
>>> When will this be deployed on my Wikipedia?
>>>
>>> It is already deployed on the following Wikipedias: it, he, hu, ru,
>>> tr, uk, uz, hr, bs, sr, sh. The deployment on English Wikipedia was
>>> planned for April 8 and on all remaining Wikipedias on April 10. This
>>> had to be postponed. New dates will be announced here as soon as we
>>> know them.
>>>
>>> ----
>>>
>>> Sounds like the inclusion syntax will be enabled on enwiki in the next
>>> few weeks. I would guess there are many active users or even bots who
>>> will replace data in infobox instances by inclusion calls. This means
>>> we will lose data if we don't extend our framework soon.
>>>
>>> Also see
>>> http://blog.wikimedia.de/2013/03/27/you-can-have-all-the-data/
>>>
>>> 5. What kind of datasets do we want to offer for download?
>>>
>>> I think that we should try to offer more or less the same datasets
>>> as
>>> before, which means that we have to merge Wikipedia and Wikidata
>>> extraction results. Even better: offer "pure" Wikipedia datasets
>>> (which will contain only the few inter-language links that remained
>>> in
>>> Wikipedia), "pure" Wikidata datasets (all the inter-language links
>>> that were moved, and the little bit of data that was already added)
>>> and "merged" datasets.
>>>
>>> 5. What kind of datasets do we want to load in the main SPARQL
>>> endpoint?
>>>
>>> Probably the "merged" datasets.
>>>
>>> 6. Do we want a new SPARQL endpoint for Wikidata data, for example
>>> at
>>> http://data.dbpedia.org/sparql?
>>>
>>> If yes, I guess this endpoint should only contain the "pure"
>>> Wikidata
>>> datasets.
>>>
>>> 7. What about the other DBpedia chapters?
>>>
>>> They certainly need the inter-language links, so we should prepare
>>> them. They'll probably also want sameAs links to data.dbpedia.org.
>>>
>>>
>>> So much for now. I'm sure there are many other questions that I
>>> forgot
>>> here and different answers. Keep them coming. :-)
>>>
>>> Cheers,
>>> JC
>>>
>>>
>>> [1]
>>>
>>>
>>> http://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Catherine_the_Great
>>> = http://www.wikidata.org/wiki/Q36450
>>>
>>>
>>> On 8 April 2013 09:03, Dimitris Kontokostas <[email protected]>
>>> wrote:
>>>
>>> Hi Anja,
>>>
>>>
>>>
>>> On Mon, Apr 8, 2013 at 9:36 AM, Anja Jentzsch <[email protected]>
>>> wrote:
>>>
>>> Hi Dimitris,
>>>
>>>
>>> On Apr 8, 2013, at 8:29, Dimitris Kontokostas <[email protected]>
>>> wrote:
>>>
>>> Hi JC,
>>>
>>>
>>> On Sun, Apr 7, 2013 at 11:55 PM, Jona Christopher Sahnwaldt
>>> <[email protected]> wrote:
>>>
>>> Hi Dimitris,
>>>
>>> a lot of important remarks. I think we should discuss this in
>>> detail.
>>>
>>> On 7 April 2013 21:38, Dimitris Kontokostas <[email protected]>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I disagree with this approach and I believe that if we use this
>>> as
>>> our
>>> main
>>> strategy we will end-up lacking in quality & completeness.
>>>
>>> Let's say that we will manage to handle {{#property P123}} or
>>> {{#property
>>> property name}} correctly & very efficiently. What will we do
>>> for
>>> templates
>>> like [1],
>>>
>>> I would think such templates are like many others for which we
>>> programmed special rules in Scala, like unit conversion templates
>>> etc.
>>> We could add special rules for templates that handle Wikidata,
>>> too.
>>> Not that I like this approach very much, but it worked (more or
>>> less)
>>> in the past.
>>>
>>> Lua scripts that use such templates
>>>
>>> For DBpedia, Lua scripts don't really differ from template
>>> definitions. We don't really parse them or use them in any way.
>>> If
>>> necessary, we try to reproduce their function in Scala. At least
>>> that's how we dealt with them in the past. Again, not beautiful,
>>> but
>>> also not a new problem.
>>>
>>> or for data in Wikidata that
>>> are not referenced from Wikipedia at all?
>>>
>>> We would lose that data, that's right.
>>>
>>> I know that we could achieve all this but it would take too much
>>> effort to
>>> get this 100% and would come with many bugs at the beggining.
>>> My point is that the data are already there and very well
>>> structured,
>>> why
>>> do we need to parse templates & Lua scripts just to get it from
>>> Wikidata in
>>> the end?
>>>
>>>
>>> There are two ways to integrate Wikidata in Wikipedia: Lua scripts
>>> or
>>> the
>>> inclusion syntax. So it would be neat to cover both.
>>>
>>> Sure I agree, template rendering is a feature we wanted (and users
>>> asked)
>>> for many years.
>>> We 'll have to implement a MW rendering engine in scala that could
>>> be
>>> useful
>>> for many-many things but I don't think that Wikidata is the reason
>>> we
>>> should
>>> built this
>>>
>>> I don't know Lua or if this is an allowed syntax but i'd expect
>>> something
>>> similar from hard-core wikipedian's sometime soon
>>>
>>> for (p in properties)
>>>    if (condition1 && condition2 && condition3)
>>>      load "{{#property p}}"
>>>
>>> So we will either miss a lot of data or put too much effort for
>>> something
>>> already very well-structured.
>>> At least at this point where nothing is yet clear.
>>>
>>> Cheers,
>>> Dimitris
>>>
>>> Cheers,
>>> Anja
>>>
>>>
>>>
>>> Maybe the lookup approach will give us some improvement over
>>> our
>>> next
>>> release (if we manage to implement it till then). Most of the
>>> data
>>> are
>>> still
>>> in wikipedia and Lua scripts & Wikidata templates are not so
>>> complex
>>> yet.
>>> But in the following release (in 1+ year) everything will be
>>> completely
>>> different again. The reason is that Wikidata started operations
>>> exactly
>>> one
>>> year ago and partly pushing into production before ~2 months so
>>> I'd
>>> expect a
>>> very big boost in the following months.
>>>
>>> I think so too.
>>>
>>> My point is that Wikidata is a completely new source and we
>>> should
>>> see
>>> it as
>>> such. Trying to re-parse already structured data will end up in
>>> a
>>> very
>>> complicated design that we might end-up not using at all.
>>>
>>> What do you mean with "re-parse already structured data"?
>>>
>>> On the other hand Wikidata data although well structured, can
>>> still be
>>> compared to our raw infobox extractor (regarding naming
>>> variance).
>>>
>>> You mean naming variance of properties? I would expect Wikidata
>>> to
>>> be
>>> much better than Wikipedia in this respect. I think that's one of
>>> the
>>> goals of Wikidata: to have one single property for birth date and
>>> use
>>> this property for all types of persons. Apparently, to add a new
>>> Wikidata property, one must go through a community process [1].
>>>
>>> I don't have the link but I read that there is no restriction in
>>> that. The
>>> goal is to provide structured data and the community will need to
>>> handle
>>> duplicates.
>>> This is yet another Wikipedia community so, even if it is a lot
>>> strickter
>>> I'd expect variations here too.
>>>
>>>
>>> I suggest
>>> that we focus on mediating this data to our DBpedia ontology
>>>
>>> This is the really interesting stuff. How could we do this? Will
>>> we
>>> let users of the mappings wiki define mappings between Wikidata
>>> properties and DBpedia ontology properties? There are a lot of
>>> possibilities.
>>>
>>> Yup, many interesting possibilities :) the tricky part will be
>>> with
>>> the
>>> classes but this is a GSoC idea so the students will have to
>>> figure
>>> this
>>> out.
>>> I was also thinking of a grease monkey script where Mappers could
>>> navigate
>>> in Wikidata and see(or even do) the mappings right in Wikidata.org
>>> :)
>>>
>>> and then fusing
>>> it with data from other DBpedia-language editions.
>>>
>>> Do you mean merging data that's already on Wikidata with stuff
>>> that's
>>> still in Wikipedia pages?
>>>
>>> The simplest thing we could do is the following:
>>> lets say Q1 is a wikidata item linking to article W1 and Wikidata
>>> property
>>> P1 is mapped to dbpedia-owl:birthDate
>>> for Q1 P1 "1/1/2000" we could assume W1 birthDate "1/1/2000" and
>>> load
>>> the
>>> second in dbpedia.org.
>>> Even without inteligence at all this could give very good results.
>>>
>>> Cheers,
>>> Dimitris
>>>
>>>
>>> So much for my specific questions.
>>>
>>> The most important question is: where do we expect Wikidata (and
>>> DBpedia) to be in one, two, three years?
>>>
>>> Cheers,
>>> JC
>>>
>>> [1] http://www.wikidata.org/wiki/Wikidata:Property_proposal
>>>
>>> Best,
>>> Dimitris
>>>
>>> [1] http://it.wikipedia.org/wiki/Template:Wikidata
>>>
>>>
>>> On Sun, Apr 7, 2013 at 3:36 AM, Jona Christopher Sahnwaldt
>>> <[email protected]>
>>> wrote:
>>>
>>> When I hear "database", I think "network", which is of course
>>> several
>>> orders of magnitude slower than a simple map access, but MapDB
>>> looks
>>> really cool. No network calls, just method calls. Nice!
>>>
>>> On 7 April 2013 01:10, Pablo N. Mendes <[email protected]>
>>> wrote:
>>>
>>> My point was rather that there are implementations out there
>>> that
>>> support
>>> both in-memory and in disk. So there is no need to go
>>> between a
>>> map
>>> and
>>> a
>>> database, because you can also access a database as via a
>>> map
>>> interface.
>>> http://www.kotek.net/blog/3G_map
>>>
>>> JDBM seems to be good both for speed and memory.
>>>
>>> Cheers,
>>> Pablo
>>>
>>>
>>> On Sat, Apr 6, 2013 at 10:41 PM, Jona Christopher Sahnwaldt
>>> <[email protected]> wrote:
>>>
>>> On 6 April 2013 15:34, Mohamed Morsey
>>> <[email protected]>
>>> wrote:
>>>
>>> Hi Pablo. Jona, and all,
>>>
>>>
>>> On 04/06/2013 01:56 PM, Pablo N. Mendes wrote:
>>>
>>>
>>> I'd say this topic can safely move out of
>>> dbpedia-discussion
>>> and
>>> to
>>> dbpedia-developers now. :)
>>>
>>> I agree with Jona. With one small detail: perhaps it is
>>> better we
>>> don't
>>> to
>>> load everything in memory, if we use a fast database such
>>> as
>>> Berkeley
>>> DB
>>> or
>>> JDBM3. They would also allow you to use in-memory when
>>> you
>>> can
>>> splunge
>>> or
>>> use disk-backed when restricted. What do you think?
>>>
>>>
>>> I agree with Pablo's idea, as it will work in both dump
>>> and
>>> live
>>> modes.
>>> Actually, for live extraction we already need a lot of
>>> memory, as
>>> we
>>> have a
>>> running Virtuoso instance that should be updated by the
>>> framework,
>>> and
>>> we
>>> have a local mirror of Wikipedia as which used MySQL as
>>> back-end
>>> storage.
>>> So, I would prefer saving as much memory as possible.
>>>
>>> Let's make it pluggable and configurable then. If you're
>>> more
>>> concerned with speed than memory (as in the dump
>>> extraction),
>>> use a
>>> map. If it's the other way round, use some kind of
>>> database.
>>>
>>> I expect the interface to be very simple: for Wikidata item
>>> X
>>> give
>>> me
>>> the value of property Y.
>>>
>>> The only problem I see is that we currently have no usable
>>> configuration in DBpedia. At least for the dump extraction
>>> - I
>>> don't
>>> know about the live extraction. The dump extraction
>>> configuration
>>> consists of flat files and static fields in some classes,
>>> which is
>>> pretty awful and would make it rather hard to exchange one
>>> implementation of this WikidataQuery interface for another.
>>>
>>>
>>>
>>> Cheers,
>>> Pablo
>>>
>>>
>>> On Fri, Apr 5, 2013 at 10:01 PM, Jona Christopher
>>> Sahnwaldt
>>> <[email protected]> wrote:
>>>
>>> On 5 April 2013 21:27, Andrea Di Menna
>>> <[email protected]>
>>> wrote:
>>>
>>> Hi Dimitris,
>>>
>>> I am not completely getting your point.
>>>
>>> How would you handle the following example? (supposing
>>> the
>>> following
>>> will be
>>> possible with Wikipedia/Wikidata)
>>>
>>> Suppose you have
>>>
>>> {{Infobox:Test
>>> | name = {{#property:p45}}
>>> }}
>>>
>>> and a mapping
>>>
>>> {{PropertyMapping | templateProperty = name |
>>> ontologyProperty
>>> =
>>> foaf:name}}
>>>
>>> what would happen when running the MappingExtractor?
>>> Which RDF triples would be generated?
>>>
>>> I think there are two questions here, and two very
>>> different
>>> approaches.
>>>
>>> 1. In the near term, I would expect that Wikipedia
>>> templates are
>>> modified like in your example.
>>>
>>> How could/should DBpedia deal with this? The simplest
>>> solution
>>> seems
>>> to be that during a preliminary step, we extract data
>>> from
>>> Wikidata
>>> and store it. During the main extraction, whenever we
>>> find
>>> a
>>> reference
>>> to Wikidata, we look it up and generate a triple as
>>> usual.
>>> Not a
>>> huge
>>> change.
>>>
>>> 2. In the long run though, when all data is moved to
>>> Wikidata,
>>> all
>>> instances of a certain infobox type will look the same.
>>> It
>>> doesn't
>>> matter anymore if an infobox is about Germany or Italy,
>>> because
>>> they
>>> all use the same properties:
>>>
>>> {{Infobox country
>>> | capitol = {{#property:p45}}
>>> | population = {{#property:p42}}
>>> ... etc. ...
>>> }}
>>>
>>> I guess Wikidata already thought of this and has plans
>>> to
>>> then
>>> replace
>>> the whole infobox by a small construct that simply
>>> instructs
>>> MediaWiki
>>> to pull all data for this item from Wikidata and display
>>> an
>>> infobox.
>>> In this case, there will be nothing left to extract for
>>> DBpedia.
>>>
>>> Implementation detail: we shouldn't use a SPARQL store
>>> to
>>> look
>>> up
>>> Wikidata data, we should keep them in memory. A SPARQL
>>> call
>>> will
>>> certainly be at least 100 times slower than a lookup in
>>> a
>>> map,
>>> but
>>> probably 10000 times or more. This matters because there
>>> will be
>>> hundreds of millions of lookup calls during an
>>> extraction.
>>> Keeping
>>> all
>>> inter-language links in memory takes about 4 or 5 GB -
>>> not
>>> much.
>>> Of
>>> course, keeping all Wikidata data in memory would take
>>> between
>>> 10
>>> and
>>> 100 times as much RAM.
>>>
>>> Cheers,
>>> JC
>>>
>>> Cheers
>>> Andrea
>>>
>>>
>>> 2013/4/5 Dimitris Kontokostas <[email protected]>
>>>
>>> Hi,
>>>
>>> For me there is no reason to complicate the DBpedia
>>> framework
>>> by
>>> resolving
>>> Wikidata data / templates.
>>> What we could do is (try to) provide a semantic
>>> mirror
>>> of
>>> Wikidata
>>> in
>>> i.e.
>>> data.dbpedia.org. We should simplify it by mapping
>>> the
>>> data
>>> to
>>> the
>>> DBpedia
>>> ontology and then use it like any other language
>>> edition
>>> we
>>> have
>>> (e.g.
>>> nl.dbpedia.org).
>>>
>>> In dbpedia.org we already aggregate data from other
>>> language
>>> editions.
>>> For
>>> now it is mostly labels & abstracts but we can also
>>> fuse
>>> Wikidata
>>> data.
>>> This
>>> way, whatever is missing from the Wikipedia dumps
>>> will
>>> be
>>> filled
>>> in
>>> the
>>> end
>>> by the Wikidata dumps
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>> On Fri, Apr 5, 2013 at 9:49 PM, Julien Plu
>>> <[email protected]> wrote:
>>>
>>> Ok, thanks for the precision :-) It's perfect, now
>>> just
>>> waiting
>>> when
>>> the
>>> dump of these data will be available.
>>>
>>> Best.
>>>
>>> Julien Plu.
>>>
>>>
>>> 2013/4/5 Jona Christopher Sahnwaldt
>>> <[email protected]>
>>>
>>> On 5 April 2013 19:59, Julien Plu
>>> <[email protected]>
>>> wrote:
>>>
>>> Hi,
>>>
>>> @Anja : Have you a post from a blog or something
>>> like
>>> that
>>> which
>>> speaking
>>> about RDF dump of wikidata ?
>>>
>>>
>>> http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
>>>
>>> @Anja: do you know when RDF dumps are planned to be
>>> available?
>>>
>>> The french wikidata will also provide their
>>> data in RDF ?
>>>
>>> There is only one Wikidata - neither English nor
>>> French nor
>>> any
>>> other
>>> language. It's just data. There are labels in
>>> different
>>> languages,
>>> but
>>> the data itself is language-agnostic.
>>>
>>> This news interest me very highly.
>>>
>>> Best
>>>
>>> Julien Plu.
>>>
>>>
>>> 2013/4/5 Tom Morris <[email protected]>
>>>
>>> On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher
>>> Sahnwaldt
>>> <[email protected]> wrote:
>>>
>>> thanks for the heads-up!
>>>
>>> On 5 April 2013 10:44, Julien Plu
>>> <[email protected]>
>>> wrote:
>>>
>>> Hi,
>>>
>>> I saw few days ago that MediaWiki since one
>>> month
>>> allow
>>> to
>>> create
>>> infoboxes
>>> (or part of them) with Lua scripting
>>> language.
>>> http://www.mediawiki.org/wiki/Lua_scripting
>>>
>>> So my question is, if every data in the
>>> wikipedia
>>> infoboxes
>>> are
>>> in
>>> Lua
>>> scripts, DBPedia will still be able to
>>> retrieve
>>> all
>>> the
>>> data
>>> as
>>> usual ?
>>>
>>> I'm not 100% sure, and we should look into
>>> this,
>>> but I
>>> think
>>> that
>>> Lua
>>> is only used in template definitions, not in
>>> template
>>> calls
>>> or
>>> other
>>> places in content pages. DBpedia does not parse
>>> template
>>> definitions,
>>> only content pages. The content pages probably
>>> will
>>> only
>>> change
>>> in
>>> minor ways, if at all. For example, {{Foo}}
>>> might
>>> change to
>>> {{#invoke:Foo}}. But that's just my preliminary
>>> understanding
>>> after
>>> looking through a few tuorial pages.
>>>
>>> As far as I can see, the template calls are
>>> unchanged
>>> for
>>> all
>>> the
>>> templates which makes sense when you consider
>>> that
>>> some
>>> of
>>> the
>>> templates
>>> that they've upgraded to use Lua like
>>> Template:Coord
>>> are
>>> used
>>> on
>>> almost a
>>> million pages.
>>>
>>> Here are the ones which have been updated so
>>> far:
>>>
>>>
>>>
>>> https://en.wikipedia.org/wiki/Category:Lua-based_templates
>>> Performance improvement looks impressive:
>>>
>>>
>>>
>>>
>>>
>>> https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance
>>>
>>> Tom
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team
>>> effectiveness.
>>> Reduce network management and security costs.Learn
>>> how
>>> to
>>> hire
>>> the most talented Cisco Certified professionals.
>>> Visit
>>> the
>>> Employer Resources Portal
>>>
>>>
>>>
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>>
>>>
>>>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team
>>> effectiveness.
>>> Reduce network management and security costs.Learn
>>> how
>>> to
>>> hire
>>> the most talented Cisco Certified professionals.
>>> Visit
>>> the
>>> Employer Resources Portal
>>>
>>>
>>>
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>>
>>>
>>>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team
>>> effectiveness.
>>> Reduce network management and security costs.Learn how
>>> to
>>> hire
>>> the most talented Cisco Certified professionals. Visit
>>> the
>>> Employer Resources Portal
>>>
>>>
>>>
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>>
>>>
>>>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team
>>> effectiveness.
>>> Reduce network management and security costs.Learn how
>>> to
>>> hire
>>> the most talented Cisco Certified professionals. Visit
>>> the
>>> Employer Resources Portal
>>>
>>>
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>>
>>>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>>
>>> --
>>>
>>> Pablo N. Mendes
>>> http://pablomendes.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team
>>> effectiveness.
>>> Reduce network management and security costs.Learn how to
>>> hire
>>> the most talented Cisco Certified professionals. Visit
>>> the
>>> Employer Resources Portal
>>>
>>>
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>>
>>>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>>
>>> --
>>> Kind Regards
>>> Mohamed Morsey
>>> Department of Computer Science
>>> University of Leipzig
>>>
>>>
>>>
>>> --
>>>
>>> Pablo N. Mendes
>>> http://pablomendes.com
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team effectiveness.
>>> Reduce network management and security costs.Learn how to hire
>>> the most talented Cisco Certified professionals. Visit the
>>> Employer Resources Portal
>>>
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>>
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team effectiveness.
>>> Reduce network management and security costs.Learn how to hire
>>> the most talented Cisco Certified professionals. Visit the
>>> Employer Resources Portal
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>>
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Precog is a next-generation analytics platform capable of advanced
>>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> apps and a phenomenal toolset for data science. Developers can use
>>> our toolset for easy data analysis & visualization. Get a free
>>> account!
>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>>
>>> --
>>>
>>> Pablo N. Mendes
>>> http://pablomendes.com
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>>
>>>
>>> --
>>> Kontokostas Dimitris
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Precog is a next-generation analytics platform capable of advanced
>>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> apps and a phenomenal toolset for data science. Developers can use
>>> our toolset for easy data analysis & visualization. Get a free account!
>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>>
>>> --
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>> Research Group: http://aksw.org
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] [Dbpedia-discussion] Wikidata

Reply via email to