On 13 April 2013 12:28, Dimitris Kontokostas <[email protected]> wrote:
> Hi Pablo,
>
> Normally I would agree with you but, under the circumstances it's a little
> more complicated.
> My main point is that we don't have someone like Jona working full time on
> the framework anymore so there is not enough time to do this right until the
> next release (1-2 months).
> Well, this is my estimation but Jona is the actual expert in the DIEF
> internals, so maybe he can make a better estimate on the effort :)

Implementing the refactoring I proposed at [1] would take three days.
Maybe two. Maybe one if we're quick and don't encounter problems that
I forgot when I wrote that proposal. Maybe I forgot a lot of stuff, so
to be very pessimistic, I'd say a week. :-)

Once we have that in place, generating data from JSON pages is
relatively simple, since the pages are well structured.

Cheers,
JC

[1] https://github.com/dbpedia/extraction-framework/pull/35


> On the other hand, we are lucky enough to have external contributions this
> year (like Andrea's) but this is a process that takes much longer and we
> cannot guarantee that these contributions will be towards this goal.
>
> What I would suggest as a transition phase is to create the next DBpedia
> release now when Wikipedia data is not affected at all. Then wait a couple
> of months to see where this thing actually goes and get better prepared.
>
> Cheers,
> Dimitris
>
>
> On Sat, Apr 13, 2013 at 12:36 PM, Pablo N. Mendes <[email protected]>
> wrote:
>>
>>
>> Hi Dimitris,
>>
>> > Maybe the lookup approach will give us some improvement over our next
>> > release ...but in the following release (in 1+ year) everything will be
>> > completely different again.
>> > Trying to re-parse already structured data will end up in a very
>> > complicated design that we might end-up not using at all.
>>
>> Maybe I misunderstood this, but I was thinking of a very simple design
>> here. You (and Jona) can estimate effort much better than me, due to my
>> limited knowledge of the DEF internals.
>>
>> My suggestion was only to smoothen the transition. In a year or so,
>> perhaps all of the data will be in WikiData, and we can just drop the markup
>> parsing. But until that point, we need a hybrid solution. If I am seeing
>> this right, the key-value store approach that was being discussed would
>> allow us to bridge the gap between "completely wiki markup" and "completely
>> wikidata".
>>
>> Once we don't need markup parsing anymore, we just make the switch, since
>> we'd already have all of the machinery to connect to wikidata anyways (it is
>> a requirement for the hybrid approach).
>>
>> Cheers,
>> Pablo
>>
>>
>>
>>
>> On Sat, Apr 13, 2013 at 10:20 AM, Jona Christopher Sahnwaldt
>> <[email protected]> wrote:
>>>
>>> On 11 April 2013 13:47, Jona Christopher Sahnwaldt <[email protected]>
>>> wrote:
>>> > All,
>>> >
>>> > I'd like to approach these decisions a bit more systematically.
>>> >
>>> > I'll try to list some of the most important open questions that come
>>> > to mind regarding the development of DBpedia and Wikidata. I'll also
>>> > add my own more or less speculative answers.
>>> >
>>> > I think we can't make good decisions about our way forward without
>>> > clearly stating and answering these questions. We should ask the
>>> > Wikidata people.
>>> >
>>> > @Anja: who should we ask at Wikidata? Just write to wikidata-l? Or is
>>> > there a better way?
>>> >
>>> >
>>> > 1. Will the Wikidata properties be messy (like Wikipedia) or clean
>>> > (like DBpedia ontology)?
>>> >
>>> > My bet is that they will be clean.
>>> >
>>> > 2. When will Wikidata RDF dumps be available?
>>> >
>>> > I have no idea. Maybe two months, maybe two years.
>>> >
>>> > 3. When will data be *copied* from Wikipedia infoboxes (or other
>>> > sources) to Wikidata?
>>> >
>>> > They're already starting. For example,
>>> > wikidata/enwiki/Catherine_the_Great [1] has a lot of data.
>>> >
>>> > 4. When will data be *removed* from Wikipedia infoboxes?
>>> >
>>> > The inclusion syntax like {{#property:father}} doesn't work yet, so
>>> > data cannot be removed. No idea when it will start. Maybe two months,
>>> > maybe two years.
>>>
>>> This is starting sooner than I expected:
>>>
>>>
>>> http://meta.wikimedia.org/wiki/Wikidata/Deployment_Questions#When_will_this_be_deployed_on_my_Wikipedia.3F
>>>
>>> ----
>>>
>>> Phase 2 (infoboxes)
>>>
>>> When will this be deployed on my Wikipedia?
>>>
>>> It is already deployed on the following Wikipedias: it, he, hu, ru,
>>> tr, uk, uz, hr, bs, sr, sh. The deployment on English Wikipedia was
>>> planned for April 8 and on all remaining Wikipedias on April 10. This
>>> had to be postponed. New dates will be announced here as soon as we
>>> know them.
>>>
>>> ----
>>>
>>> Sounds like the inclusion syntax will be enabled on enwiki in the next
>>> few weeks. I would guess there are many active users or even bots who
>>> will replace data in infobox instances by inclusion calls. This means
>>> we will lose data if we don't extend our framework soon.
>>>
>>> Also see http://blog.wikimedia.de/2013/03/27/you-can-have-all-the-data/
>>>
>>> >
>>> > 5. What kind of datasets do we want to offer for download?
>>> >
>>> > I think that we should try to offer more or less the same datasets as
>>> > before, which means that we have to merge Wikipedia and Wikidata
>>> > extraction results. Even better: offer "pure" Wikipedia datasets
>>> > (which will contain only the few inter-language links that remained in
>>> > Wikipedia), "pure" Wikidata datasets (all the inter-language links
>>> > that were moved, and the little bit of data that was already added)
>>> > and "merged" datasets.
>>> >
>>> > 5. What kind of datasets do we want to load in the main SPARQL
>>> > endpoint?
>>> >
>>> > Probably the "merged" datasets.
>>> >
>>> > 6. Do we want a new SPARQL endpoint for Wikidata data, for example at
>>> > http://data.dbpedia.org/sparql?
>>> >
>>> > If yes, I guess this endpoint should only contain the "pure" Wikidata
>>> > datasets.
>>> >
>>> > 7. What about the other DBpedia chapters?
>>> >
>>> > They certainly need the inter-language links, so we should prepare
>>> > them. They'll probably also want sameAs links to data.dbpedia.org.
>>> >
>>> >
>>> > So much for now. I'm sure there are many other questions that I forgot
>>> > here and different answers. Keep them coming. :-)
>>> >
>>> > Cheers,
>>> > JC
>>> >
>>> >
>>> > [1]
>>> > http://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Catherine_the_Great
>>> > = http://www.wikidata.org/wiki/Q36450
>>> >
>>> >
>>> > On 8 April 2013 09:03, Dimitris Kontokostas <[email protected]> wrote:
>>> >> Hi Anja,
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Apr 8, 2013 at 9:36 AM, Anja Jentzsch <[email protected]> wrote:
>>> >>>
>>> >>> Hi Dimitris,
>>> >>>
>>> >>>
>>> >>> On Apr 8, 2013, at 8:29, Dimitris Kontokostas <[email protected]>
>>> >>> wrote:
>>> >>>
>>> >>> Hi JC,
>>> >>>
>>> >>>
>>> >>> On Sun, Apr 7, 2013 at 11:55 PM, Jona Christopher Sahnwaldt
>>> >>> <[email protected]> wrote:
>>> >>>>
>>> >>>> Hi Dimitris,
>>> >>>>
>>> >>>> a lot of important remarks. I think we should discuss this in
>>> >>>> detail.
>>> >>>>
>>> >>>> On 7 April 2013 21:38, Dimitris Kontokostas <[email protected]>
>>> >>>> wrote:
>>> >>>> > Hi,
>>> >>>> >
>>> >>>> > I disagree with this approach and I believe that if we use this as
>>> >>>> > our
>>> >>>> > main
>>> >>>> > strategy we will end-up lacking in quality & completeness.
>>> >>>> >
>>> >>>> > Let's say that we will manage to handle {{#property P123}} or
>>> >>>> > {{#property
>>> >>>> > property name}} correctly & very efficiently. What will we do for
>>> >>>> > templates
>>> >>>> > like [1],
>>> >>>>
>>> >>>> I would think such templates are like many others for which we
>>> >>>> programmed special rules in Scala, like unit conversion templates
>>> >>>> etc.
>>> >>>> We could add special rules for templates that handle Wikidata, too.
>>> >>>> Not that I like this approach very much, but it worked (more or
>>> >>>> less)
>>> >>>> in the past.
>>> >>>>
>>> >>>> > Lua scripts that use such templates
>>> >>>>
>>> >>>> For DBpedia, Lua scripts don't really differ from template
>>> >>>> definitions. We don't really parse them or use them in any way. If
>>> >>>> necessary, we try to reproduce their function in Scala. At least
>>> >>>> that's how we dealt with them in the past. Again, not beautiful, but
>>> >>>> also not a new problem.
>>> >>>>
>>> >>>> > or for data in Wikidata that
>>> >>>> > are not referenced from Wikipedia at all?
>>> >>>>
>>> >>>> We would lose that data, that's right.
>>> >>>
>>> >>>
>>> >>> I know that we could achieve all this but it would take too much
>>> >>> effort to
>>> >>> get this 100% and would come with many bugs at the beggining.
>>> >>> My point is that the data are already there and very well structured,
>>> >>> why
>>> >>> do we need to parse templates & Lua scripts just to get it from
>>> >>> Wikidata in
>>> >>> the end?
>>> >>>
>>> >>>
>>> >>> There are two ways to integrate Wikidata in Wikipedia: Lua scripts or
>>> >>> the
>>> >>> inclusion syntax. So it would be neat to cover both.
>>> >>
>>> >>
>>> >> Sure I agree, template rendering is a feature we wanted (and users
>>> >> asked)
>>> >> for many years.
>>> >> We 'll have to implement a MW rendering engine in scala that could be
>>> >> useful
>>> >> for many-many things but I don't think that Wikidata is the reason we
>>> >> should
>>> >> built this
>>> >>
>>> >> I don't know Lua or if this is an allowed syntax but i'd expect
>>> >> something
>>> >> similar from hard-core wikipedian's sometime soon
>>> >>
>>> >> for (p in properties)
>>> >>   if (condition1 && condition2 && condition3)
>>> >>     load "{{#property p}}"
>>> >>
>>> >> So we will either miss a lot of data or put too much effort for
>>> >> something
>>> >> already very well-structured.
>>> >> At least at this point where nothing is yet clear.
>>> >>
>>> >> Cheers,
>>> >> Dimitris
>>> >>
>>> >>>
>>> >>> Cheers,
>>> >>> Anja
>>> >>>
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> >
>>> >>>> > Maybe the lookup approach will give us some improvement over our
>>> >>>> > next
>>> >>>> > release (if we manage to implement it till then). Most of the data
>>> >>>> > are
>>> >>>> > still
>>> >>>> > in wikipedia and Lua scripts & Wikidata templates are not so
>>> >>>> > complex
>>> >>>> > yet.
>>> >>>> > But in the following release (in 1+ year) everything will be
>>> >>>> > completely
>>> >>>> > different again. The reason is that Wikidata started operations
>>> >>>> > exactly
>>> >>>> > one
>>> >>>> > year ago and partly pushing into production before ~2 months so
>>> >>>> > I'd
>>> >>>> > expect a
>>> >>>> > very big boost in the following months.
>>> >>>>
>>> >>>> I think so too.
>>> >>>>
>>> >>>> > My point is that Wikidata is a completely new source and we should
>>> >>>> > see
>>> >>>> > it as
>>> >>>> > such. Trying to re-parse already structured data will end up in a
>>> >>>> > very
>>> >>>> > complicated design that we might end-up not using at all.
>>> >>>>
>>> >>>> What do you mean with "re-parse already structured data"?
>>> >>>>
>>> >>>> > On the other hand Wikidata data although well structured, can
>>> >>>> > still be
>>> >>>> > compared to our raw infobox extractor (regarding naming variance).
>>> >>>>
>>> >>>> You mean naming variance of properties? I would expect Wikidata to
>>> >>>> be
>>> >>>> much better than Wikipedia in this respect. I think that's one of
>>> >>>> the
>>> >>>> goals of Wikidata: to have one single property for birth date and
>>> >>>> use
>>> >>>> this property for all types of persons. Apparently, to add a new
>>> >>>> Wikidata property, one must go through a community process [1].
>>> >>>
>>> >>>
>>> >>> I don't have the link but I read that there is no restriction in
>>> >>> that. The
>>> >>> goal is to provide structured data and the community will need to
>>> >>> handle
>>> >>> duplicates.
>>> >>> This is yet another Wikipedia community so, even if it is a lot
>>> >>> strickter
>>> >>> I'd expect variations here too.
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> > I suggest
>>> >>>> > that we focus on mediating this data to our DBpedia ontology
>>> >>>>
>>> >>>> This is the really interesting stuff. How could we do this? Will we
>>> >>>> let users of the mappings wiki define mappings between Wikidata
>>> >>>> properties and DBpedia ontology properties? There are a lot of
>>> >>>> possibilities.
>>> >>>
>>> >>>
>>> >>> Yup, many interesting possibilities :) the tricky part will be with
>>> >>> the
>>> >>> classes but this is a GSoC idea so the students will have to figure
>>> >>> this
>>> >>> out.
>>> >>> I was also thinking of a grease monkey script where Mappers could
>>> >>> navigate
>>> >>> in Wikidata and see(or even do) the mappings right in Wikidata.org :)
>>> >>>
>>> >>>>
>>> >>>> > and then fusing
>>> >>>> > it with data from other DBpedia-language editions.
>>> >>>>
>>> >>>> Do you mean merging data that's already on Wikidata with stuff
>>> >>>> that's
>>> >>>> still in Wikipedia pages?
>>> >>>
>>> >>>
>>> >>> The simplest thing we could do is the following:
>>> >>> lets say Q1 is a wikidata item linking to article W1 and Wikidata
>>> >>> property
>>> >>> P1 is mapped to dbpedia-owl:birthDate
>>> >>> for Q1 P1 "1/1/2000" we could assume W1 birthDate "1/1/2000" and load
>>> >>> the
>>> >>> second in dbpedia.org.
>>> >>> Even without inteligence at all this could give very good results.
>>> >>>
>>> >>> Cheers,
>>> >>> Dimitris
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> So much for my specific questions.
>>> >>>>
>>> >>>> The most important question is: where do we expect Wikidata (and
>>> >>>> DBpedia) to be in one, two, three years?
>>> >>>>
>>> >>>> Cheers,
>>> >>>> JC
>>> >>>>
>>> >>>> [1] http://www.wikidata.org/wiki/Wikidata:Property_proposal
>>> >>>>
>>> >>>> >
>>> >>>> > Best,
>>> >>>> > Dimitris
>>> >>>> >
>>> >>>> > [1] http://it.wikipedia.org/wiki/Template:Wikidata
>>> >>>> >
>>> >>>> >
>>> >>>> > On Sun, Apr 7, 2013 at 3:36 AM, Jona Christopher Sahnwaldt
>>> >>>> > <[email protected]>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> When I hear "database", I think "network", which is of course
>>> >>>> >> several
>>> >>>> >> orders of magnitude slower than a simple map access, but MapDB
>>> >>>> >> looks
>>> >>>> >> really cool. No network calls, just method calls. Nice!
>>> >>>> >>
>>> >>>> >> On 7 April 2013 01:10, Pablo N. Mendes <[email protected]>
>>> >>>> >> wrote:
>>> >>>> >> >
>>> >>>> >> > My point was rather that there are implementations out there
>>> >>>> >> > that
>>> >>>> >> > support
>>> >>>> >> > both in-memory and in disk. So there is no need to go between a
>>> >>>> >> > map
>>> >>>> >> > and
>>> >>>> >> > a
>>> >>>> >> > database, because you can also access a database as via a map
>>> >>>> >> > interface.
>>> >>>> >> > http://www.kotek.net/blog/3G_map
>>> >>>> >> >
>>> >>>> >> > JDBM seems to be good both for speed and memory.
>>> >>>> >> >
>>> >>>> >> > Cheers,
>>> >>>> >> > Pablo
>>> >>>> >> >
>>> >>>> >> >
>>> >>>> >> > On Sat, Apr 6, 2013 at 10:41 PM, Jona Christopher Sahnwaldt
>>> >>>> >> > <[email protected]> wrote:
>>> >>>> >> >>
>>> >>>> >> >> On 6 April 2013 15:34, Mohamed Morsey
>>> >>>> >> >> <[email protected]>
>>> >>>> >> >> wrote:
>>> >>>> >> >> > Hi Pablo. Jona, and all,
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > On 04/06/2013 01:56 PM, Pablo N. Mendes wrote:
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > I'd say this topic can safely move out of dbpedia-discussion
>>> >>>> >> >> > and
>>> >>>> >> >> > to
>>> >>>> >> >> > dbpedia-developers now. :)
>>> >>>> >> >> >
>>> >>>> >> >> > I agree with Jona. With one small detail: perhaps it is
>>> >>>> >> >> > better we
>>> >>>> >> >> > don't
>>> >>>> >> >> > to
>>> >>>> >> >> > load everything in memory, if we use a fast database such as
>>> >>>> >> >> > Berkeley
>>> >>>> >> >> > DB
>>> >>>> >> >> > or
>>> >>>> >> >> > JDBM3. They would also allow you to use in-memory when you
>>> >>>> >> >> > can
>>> >>>> >> >> > splunge
>>> >>>> >> >> > or
>>> >>>> >> >> > use disk-backed when restricted. What do you think?
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > I agree with Pablo's idea, as it will work in both dump and
>>> >>>> >> >> > live
>>> >>>> >> >> > modes.
>>> >>>> >> >> > Actually, for live extraction we already need a lot of
>>> >>>> >> >> > memory, as
>>> >>>> >> >> > we
>>> >>>> >> >> > have a
>>> >>>> >> >> > running Virtuoso instance that should be updated by the
>>> >>>> >> >> > framework,
>>> >>>> >> >> > and
>>> >>>> >> >> > we
>>> >>>> >> >> > have a local mirror of Wikipedia as which used MySQL as
>>> >>>> >> >> > back-end
>>> >>>> >> >> > storage.
>>> >>>> >> >> > So, I would prefer saving as much memory as possible.
>>> >>>> >> >>
>>> >>>> >> >> Let's make it pluggable and configurable then. If you're more
>>> >>>> >> >> concerned with speed than memory (as in the dump extraction),
>>> >>>> >> >> use a
>>> >>>> >> >> map. If it's the other way round, use some kind of database.
>>> >>>> >> >>
>>> >>>> >> >> I expect the interface to be very simple: for Wikidata item X
>>> >>>> >> >> give
>>> >>>> >> >> me
>>> >>>> >> >> the value of property Y.
>>> >>>> >> >>
>>> >>>> >> >> The only problem I see is that we currently have no usable
>>> >>>> >> >> configuration in DBpedia. At least for the dump extraction - I
>>> >>>> >> >> don't
>>> >>>> >> >> know about the live extraction. The dump extraction
>>> >>>> >> >> configuration
>>> >>>> >> >> consists of flat files and static fields in some classes,
>>> >>>> >> >> which is
>>> >>>> >> >> pretty awful and would make it rather hard to exchange one
>>> >>>> >> >> implementation of this WikidataQuery interface for another.
>>> >>>> >> >>
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > Cheers,
>>> >>>> >> >> > Pablo
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > On Fri, Apr 5, 2013 at 10:01 PM, Jona Christopher Sahnwaldt
>>> >>>> >> >> > <[email protected]> wrote:
>>> >>>> >> >> >>
>>> >>>> >> >> >> On 5 April 2013 21:27, Andrea Di Menna <[email protected]>
>>> >>>> >> >> >> wrote:
>>> >>>> >> >> >> > Hi Dimitris,
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > I am not completely getting your point.
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > How would you handle the following example? (supposing
>>> >>>> >> >> >> > the
>>> >>>> >> >> >> > following
>>> >>>> >> >> >> > will be
>>> >>>> >> >> >> > possible with Wikipedia/Wikidata)
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > Suppose you have
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > {{Infobox:Test
>>> >>>> >> >> >> > | name = {{#property:p45}}
>>> >>>> >> >> >> > }}
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > and a mapping
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > {{PropertyMapping | templateProperty = name |
>>> >>>> >> >> >> > ontologyProperty
>>> >>>> >> >> >> > =
>>> >>>> >> >> >> > foaf:name}}
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > what would happen when running the MappingExtractor?
>>> >>>> >> >> >> > Which RDF triples would be generated?
>>> >>>> >> >> >>
>>> >>>> >> >> >> I think there are two questions here, and two very
>>> >>>> >> >> >> different
>>> >>>> >> >> >> approaches.
>>> >>>> >> >> >>
>>> >>>> >> >> >> 1. In the near term, I would expect that Wikipedia
>>> >>>> >> >> >> templates are
>>> >>>> >> >> >> modified like in your example.
>>> >>>> >> >> >>
>>> >>>> >> >> >> How could/should DBpedia deal with this? The simplest
>>> >>>> >> >> >> solution
>>> >>>> >> >> >> seems
>>> >>>> >> >> >> to be that during a preliminary step, we extract data from
>>> >>>> >> >> >> Wikidata
>>> >>>> >> >> >> and store it. During the main extraction, whenever we find
>>> >>>> >> >> >> a
>>> >>>> >> >> >> reference
>>> >>>> >> >> >> to Wikidata, we look it up and generate a triple as usual.
>>> >>>> >> >> >> Not a
>>> >>>> >> >> >> huge
>>> >>>> >> >> >> change.
>>> >>>> >> >> >>
>>> >>>> >> >> >> 2. In the long run though, when all data is moved to
>>> >>>> >> >> >> Wikidata,
>>> >>>> >> >> >> all
>>> >>>> >> >> >> instances of a certain infobox type will look the same. It
>>> >>>> >> >> >> doesn't
>>> >>>> >> >> >> matter anymore if an infobox is about Germany or Italy,
>>> >>>> >> >> >> because
>>> >>>> >> >> >> they
>>> >>>> >> >> >> all use the same properties:
>>> >>>> >> >> >>
>>> >>>> >> >> >> {{Infobox country
>>> >>>> >> >> >> | capitol = {{#property:p45}}
>>> >>>> >> >> >> | population = {{#property:p42}}
>>> >>>> >> >> >> ... etc. ...
>>> >>>> >> >> >> }}
>>> >>>> >> >> >>
>>> >>>> >> >> >> I guess Wikidata already thought of this and has plans to
>>> >>>> >> >> >> then
>>> >>>> >> >> >> replace
>>> >>>> >> >> >> the whole infobox by a small construct that simply
>>> >>>> >> >> >> instructs
>>> >>>> >> >> >> MediaWiki
>>> >>>> >> >> >> to pull all data for this item from Wikidata and display an
>>> >>>> >> >> >> infobox.
>>> >>>> >> >> >> In this case, there will be nothing left to extract for
>>> >>>> >> >> >> DBpedia.
>>> >>>> >> >> >>
>>> >>>> >> >> >> Implementation detail: we shouldn't use a SPARQL store to
>>> >>>> >> >> >> look
>>> >>>> >> >> >> up
>>> >>>> >> >> >> Wikidata data, we should keep them in memory. A SPARQL call
>>> >>>> >> >> >> will
>>> >>>> >> >> >> certainly be at least 100 times slower than a lookup in a
>>> >>>> >> >> >> map,
>>> >>>> >> >> >> but
>>> >>>> >> >> >> probably 10000 times or more. This matters because there
>>> >>>> >> >> >> will be
>>> >>>> >> >> >> hundreds of millions of lookup calls during an extraction.
>>> >>>> >> >> >> Keeping
>>> >>>> >> >> >> all
>>> >>>> >> >> >> inter-language links in memory takes about 4 or 5 GB - not
>>> >>>> >> >> >> much.
>>> >>>> >> >> >> Of
>>> >>>> >> >> >> course, keeping all Wikidata data in memory would take
>>> >>>> >> >> >> between
>>> >>>> >> >> >> 10
>>> >>>> >> >> >> and
>>> >>>> >> >> >> 100 times as much RAM.
>>> >>>> >> >> >>
>>> >>>> >> >> >> Cheers,
>>> >>>> >> >> >> JC
>>> >>>> >> >> >>
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > Cheers
>>> >>>> >> >> >> > Andrea
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > 2013/4/5 Dimitris Kontokostas <[email protected]>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> Hi,
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> For me there is no reason to complicate the DBpedia
>>> >>>> >> >> >> >> framework
>>> >>>> >> >> >> >> by
>>> >>>> >> >> >> >> resolving
>>> >>>> >> >> >> >> Wikidata data / templates.
>>> >>>> >> >> >> >> What we could do is (try to) provide a semantic mirror
>>> >>>> >> >> >> >> of
>>> >>>> >> >> >> >> Wikidata
>>> >>>> >> >> >> >> in
>>> >>>> >> >> >> >> i.e.
>>> >>>> >> >> >> >> data.dbpedia.org. We should simplify it by mapping the
>>> >>>> >> >> >> >> data
>>> >>>> >> >> >> >> to
>>> >>>> >> >> >> >> the
>>> >>>> >> >> >> >> DBpedia
>>> >>>> >> >> >> >> ontology and then use it like any other language edition
>>> >>>> >> >> >> >> we
>>> >>>> >> >> >> >> have
>>> >>>> >> >> >> >> (e.g.
>>> >>>> >> >> >> >> nl.dbpedia.org).
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> In dbpedia.org we already aggregate data from other
>>> >>>> >> >> >> >> language
>>> >>>> >> >> >> >> editions.
>>> >>>> >> >> >> >> For
>>> >>>> >> >> >> >> now it is mostly labels & abstracts but we can also fuse
>>> >>>> >> >> >> >> Wikidata
>>> >>>> >> >> >> >> data.
>>> >>>> >> >> >> >> This
>>> >>>> >> >> >> >> way, whatever is missing from the Wikipedia dumps will
>>> >>>> >> >> >> >> be
>>> >>>> >> >> >> >> filled
>>> >>>> >> >> >> >> in
>>> >>>> >> >> >> >> the
>>> >>>> >> >> >> >> end
>>> >>>> >> >> >> >> by the Wikidata dumps
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> Best,
>>> >>>> >> >> >> >> Dimitris
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> On Fri, Apr 5, 2013 at 9:49 PM, Julien Plu
>>> >>>> >> >> >> >> <[email protected]> wrote:
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> Ok, thanks for the precision :-) It's perfect, now just
>>> >>>> >> >> >> >>> waiting
>>> >>>> >> >> >> >>> when
>>> >>>> >> >> >> >>> the
>>> >>>> >> >> >> >>> dump of these data will be available.
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> Best.
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> Julien Plu.
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> 2013/4/5 Jona Christopher Sahnwaldt <[email protected]>
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>> On 5 April 2013 19:59, Julien Plu
>>> >>>> >> >> >> >>>> <[email protected]>
>>> >>>> >> >> >> >>>> wrote:
>>> >>>> >> >> >> >>>> > Hi,
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> > @Anja : Have you a post from a blog or something
>>> >>>> >> >> >> >>>> > like
>>> >>>> >> >> >> >>>> > that
>>> >>>> >> >> >> >>>> > which
>>> >>>> >> >> >> >>>> > speaking
>>> >>>> >> >> >> >>>> > about RDF dump of wikidata ?
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>> http://meta.wikimedia.org/wiki/Wikidata/Development/RDF
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>> @Anja: do you know when RDF dumps are planned to be
>>> >>>> >> >> >> >>>> available?
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>> > The french wikidata will also provide their
>>> >>>> >> >> >> >>>> > data in RDF ?
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>> There is only one Wikidata - neither English nor
>>> >>>> >> >> >> >>>> French nor
>>> >>>> >> >> >> >>>> any
>>> >>>> >> >> >> >>>> other
>>> >>>> >> >> >> >>>> language. It's just data. There are labels in
>>> >>>> >> >> >> >>>> different
>>> >>>> >> >> >> >>>> languages,
>>> >>>> >> >> >> >>>> but
>>> >>>> >> >> >> >>>> the data itself is language-agnostic.
>>> >>>> >> >> >> >>>>
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> > This news interest me very highly.
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> > Best
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> > Julien Plu.
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> > 2013/4/5 Tom Morris <[email protected]>
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >> On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher
>>> >>>> >> >> >> >>>> >> Sahnwaldt
>>> >>>> >> >> >> >>>> >> <[email protected]> wrote:
>>> >>>> >> >> >> >>>> >>>
>>> >>>> >> >> >> >>>> >>>
>>> >>>> >> >> >> >>>> >>> thanks for the heads-up!
>>> >>>> >> >> >> >>>> >>>
>>> >>>> >> >> >> >>>> >>> On 5 April 2013 10:44, Julien Plu
>>> >>>> >> >> >> >>>> >>> <[email protected]>
>>> >>>> >> >> >> >>>> >>> wrote:
>>> >>>> >> >> >> >>>> >>> > Hi,
>>> >>>> >> >> >> >>>> >>> >
>>> >>>> >> >> >> >>>> >>> > I saw few days ago that MediaWiki since one
>>> >>>> >> >> >> >>>> >>> > month
>>> >>>> >> >> >> >>>> >>> > allow
>>> >>>> >> >> >> >>>> >>> > to
>>> >>>> >> >> >> >>>> >>> > create
>>> >>>> >> >> >> >>>> >>> > infoboxes
>>> >>>> >> >> >> >>>> >>> > (or part of them) with Lua scripting language.
>>> >>>> >> >> >> >>>> >>> > http://www.mediawiki.org/wiki/Lua_scripting
>>> >>>> >> >> >> >>>> >>> >
>>> >>>> >> >> >> >>>> >>> > So my question is, if every data in the
>>> >>>> >> >> >> >>>> >>> > wikipedia
>>> >>>> >> >> >> >>>> >>> > infoboxes
>>> >>>> >> >> >> >>>> >>> > are
>>> >>>> >> >> >> >>>> >>> > in
>>> >>>> >> >> >> >>>> >>> > Lua
>>> >>>> >> >> >> >>>> >>> > scripts, DBPedia will still be able to retrieve
>>> >>>> >> >> >> >>>> >>> > all
>>> >>>> >> >> >> >>>> >>> > the
>>> >>>> >> >> >> >>>> >>> > data
>>> >>>> >> >> >> >>>> >>> > as
>>> >>>> >> >> >> >>>> >>> > usual ?
>>> >>>> >> >> >> >>>> >>>
>>> >>>> >> >> >> >>>> >>> I'm not 100% sure, and we should look into this,
>>> >>>> >> >> >> >>>> >>> but I
>>> >>>> >> >> >> >>>> >>> think
>>> >>>> >> >> >> >>>> >>> that
>>> >>>> >> >> >> >>>> >>> Lua
>>> >>>> >> >> >> >>>> >>> is only used in template definitions, not in
>>> >>>> >> >> >> >>>> >>> template
>>> >>>> >> >> >> >>>> >>> calls
>>> >>>> >> >> >> >>>> >>> or
>>> >>>> >> >> >> >>>> >>> other
>>> >>>> >> >> >> >>>> >>> places in content pages. DBpedia does not parse
>>> >>>> >> >> >> >>>> >>> template
>>> >>>> >> >> >> >>>> >>> definitions,
>>> >>>> >> >> >> >>>> >>> only content pages. The content pages probably
>>> >>>> >> >> >> >>>> >>> will
>>> >>>> >> >> >> >>>> >>> only
>>> >>>> >> >> >> >>>> >>> change
>>> >>>> >> >> >> >>>> >>> in
>>> >>>> >> >> >> >>>> >>> minor ways, if at all. For example, {{Foo}} might
>>> >>>> >> >> >> >>>> >>> change to
>>> >>>> >> >> >> >>>> >>> {{#invoke:Foo}}. But that's just my preliminary
>>> >>>> >> >> >> >>>> >>> understanding
>>> >>>> >> >> >> >>>> >>> after
>>> >>>> >> >> >> >>>> >>> looking through a few tuorial pages.
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >> As far as I can see, the template calls are
>>> >>>> >> >> >> >>>> >> unchanged
>>> >>>> >> >> >> >>>> >> for
>>> >>>> >> >> >> >>>> >> all
>>> >>>> >> >> >> >>>> >> the
>>> >>>> >> >> >> >>>> >> templates which makes sense when you consider that
>>> >>>> >> >> >> >>>> >> some
>>> >>>> >> >> >> >>>> >> of
>>> >>>> >> >> >> >>>> >> the
>>> >>>> >> >> >> >>>> >> templates
>>> >>>> >> >> >> >>>> >> that they've upgraded to use Lua like
>>> >>>> >> >> >> >>>> >> Template:Coord
>>> >>>> >> >> >> >>>> >> are
>>> >>>> >> >> >> >>>> >> used
>>> >>>> >> >> >> >>>> >> on
>>> >>>> >> >> >> >>>> >> almost a
>>> >>>> >> >> >> >>>> >> million pages.
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >> Here are the ones which have been updated so far:
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >> https://en.wikipedia.org/wiki/Category:Lua-based_templates
>>> >>>> >> >> >> >>>> >> Performance improvement looks impressive:
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >> https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance
>>> >>>> >> >> >> >>>> >>
>>> >>>> >> >> >> >>>> >> Tom
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>> >
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> ------------------------------------------------------------------------------
>>> >>>> >> >> >> >>> Minimize network downtime and maximize team
>>> >>>> >> >> >> >>> effectiveness.
>>> >>>> >> >> >> >>> Reduce network management and security costs.Learn how
>>> >>>> >> >> >> >>> to
>>> >>>> >> >> >> >>> hire
>>> >>>> >> >> >> >>> the most talented Cisco Certified professionals. Visit
>>> >>>> >> >> >> >>> the
>>> >>>> >> >> >> >>> Employer Resources Portal
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>> >> >> >> >>> _______________________________________________
>>> >>>> >> >> >> >>> Dbpedia-discussion mailing list
>>> >>>> >> >> >> >>> [email protected]
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >>>> >> >> >> >>>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> --
>>> >>>> >> >> >> >> Kontokostas Dimitris
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> ------------------------------------------------------------------------------
>>> >>>> >> >> >> >> Minimize network downtime and maximize team
>>> >>>> >> >> >> >> effectiveness.
>>> >>>> >> >> >> >> Reduce network management and security costs.Learn how
>>> >>>> >> >> >> >> to
>>> >>>> >> >> >> >> hire
>>> >>>> >> >> >> >> the most talented Cisco Certified professionals. Visit
>>> >>>> >> >> >> >> the
>>> >>>> >> >> >> >> Employer Resources Portal
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>> >> >> >> >> _______________________________________________
>>> >>>> >> >> >> >> Dbpedia-discussion mailing list
>>> >>>> >> >> >> >> [email protected]
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >>>> >> >> >> >>
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > ------------------------------------------------------------------------------
>>> >>>> >> >> >> > Minimize network downtime and maximize team
>>> >>>> >> >> >> > effectiveness.
>>> >>>> >> >> >> > Reduce network management and security costs.Learn how to
>>> >>>> >> >> >> > hire
>>> >>>> >> >> >> > the most talented Cisco Certified professionals. Visit
>>> >>>> >> >> >> > the
>>> >>>> >> >> >> > Employer Resources Portal
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>> >> >> >> > _______________________________________________
>>> >>>> >> >> >> > Dbpedia-discussion mailing list
>>> >>>> >> >> >> > [email protected]
>>> >>>> >> >> >> >
>>> >>>> >> >> >> >
>>> >>>> >> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >>>> >> >> >> >
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >> ------------------------------------------------------------------------------
>>> >>>> >> >> >> Minimize network downtime and maximize team effectiveness.
>>> >>>> >> >> >> Reduce network management and security costs.Learn how to
>>> >>>> >> >> >> hire
>>> >>>> >> >> >> the most talented Cisco Certified professionals. Visit the
>>> >>>> >> >> >> Employer Resources Portal
>>> >>>> >> >> >>
>>> >>>> >> >> >> http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>> >> >> >> _______________________________________________
>>> >>>> >> >> >> Dbpedia-discussion mailing list
>>> >>>> >> >> >> [email protected]
>>> >>>> >> >> >>
>>> >>>> >> >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > --
>>> >>>> >> >> >
>>> >>>> >> >> > Pablo N. Mendes
>>> >>>> >> >> > http://pablomendes.com
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > ------------------------------------------------------------------------------
>>> >>>> >> >> > Minimize network downtime and maximize team effectiveness.
>>> >>>> >> >> > Reduce network management and security costs.Learn how to
>>> >>>> >> >> > hire
>>> >>>> >> >> > the most talented Cisco Certified professionals. Visit the
>>> >>>> >> >> > Employer Resources Portal
>>> >>>> >> >> >
>>> >>>> >> >> > http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > _______________________________________________
>>> >>>> >> >> > Dbpedia-discussion mailing list
>>> >>>> >> >> > [email protected]
>>> >>>> >> >> >
>>> >>>> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > --
>>> >>>> >> >> > Kind Regards
>>> >>>> >> >> > Mohamed Morsey
>>> >>>> >> >> > Department of Computer Science
>>> >>>> >> >> > University of Leipzig
>>> >>>> >> >
>>> >>>> >> >
>>> >>>> >> >
>>> >>>> >> >
>>> >>>> >> > --
>>> >>>> >> >
>>> >>>> >> > Pablo N. Mendes
>>> >>>> >> > http://pablomendes.com
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> ------------------------------------------------------------------------------
>>> >>>> >> Minimize network downtime and maximize team effectiveness.
>>> >>>> >> Reduce network management and security costs.Learn how to hire
>>> >>>> >> the most talented Cisco Certified professionals. Visit the
>>> >>>> >> Employer Resources Portal
>>> >>>> >> http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>> >> _______________________________________________
>>> >>>> >> Dbpedia-developers mailing list
>>> >>>> >> [email protected]
>>> >>>> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > Kontokostas Dimitris
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Kontokostas Dimitris
>>> >>>
>>> >>>
>>> >>>
>>> >>> ------------------------------------------------------------------------------
>>> >>> Minimize network downtime and maximize team effectiveness.
>>> >>> Reduce network management and security costs.Learn how to hire
>>> >>> the most talented Cisco Certified professionals. Visit the
>>> >>> Employer Resources Portal
>>> >>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> >>>
>>> >>> _______________________________________________
>>> >>> Dbpedia-developers mailing list
>>> >>> [email protected]
>>> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Kontokostas Dimitris
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Precog is a next-generation analytics platform capable of advanced
>>> analytics on semi-structured data. The platform includes APIs for
>>> building
>>> apps and a phenomenal toolset for data science. Developers can use
>>> our toolset for easy data analysis & visualization. Get a free account!
>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>>
>>
>>
>> --
>>
>> Pablo N. Mendes
>> http://pablomendes.com
>
>
>
>
> --
> Kontokostas Dimitris

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to