Hi,
Just wanted to express my belated support for such dumps:
- We encounter the same problem in research, and both for efficiency,
reproducibility, and authoritativeness a centralized solution would be
great.
- Besides the filtering for existence in Wikipedia, I'd see much potential
in
Hello all!
Le mar. 17 déc. 2019 à 18:15, Aidan Hogan a écrit :
>
> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
>
---
>
> Message: 1
> Date: Thu, 19 Dec 2019 19:15:09 -0300
> From: Aidan Hogan
> To: wikidata@lists.wikimedia.org
> Subject: Re: [Wikidata] Concise/Notable Wikidata Dump
> Message-ID:
> Content-Type: text/plain; charset=utf-8; for
Hi Aidan,
since DBpedia has been around for twelve years now, we spent the last 3
years intensively re-engineering to solve problems like this.
Last week, we finished the Virtuoso DBpedia Docker[1] to work on
Databus Collections[2],[3]. Databus contains different repartitions of
all the
On Sat, Dec 21, 2019 at 6:37 PM Dan Brickley wrote:
> That is also a fine place to record things! I don’t mean to fork the
> discussion. Maybe we could have a call for interested parties in the new year?
Yeah that sounds like a good idea.
Cheers
Lydia
--
Lydia Pintscher -
On Sat, 21 Dec 2019 at 17:25, Lydia Pintscher
wrote:
> On Thu, Dec 19, 2019 at 11:16 PM Aidan Hogan wrote:
> > - @Lydia, good point! I was thinking that filtering by wikilinks will
> > just drop some more obscure nodes (like Q51366847 for example), but had
> > not considered that there are some
On Thu, Dec 19, 2019 at 11:16 PM Aidan Hogan wrote:
> - @Lydia, good point! I was thinking that filtering by wikilinks will
> just drop some more obscure nodes (like Q51366847 for example), but had
> not considered that there are some more general "concepts" that do not
> have a corresponding
Hey all,
Just a general response to all the comments thus far.
- @Marco et al., regarding the WDumper by Benno, this is a very cool
initiative! In fact I spotted it just *after* posting so I think this
goes quite some ways towards addressing the general issue raised.
- @Markus, I partially
On Tue, Dec 17, 2019 at 7:16 PM Aidan Hogan wrote:
>
> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
> due to the
Hi all,
Yes, Benno's WDumper could be used for this purpose. The motivation for
the whole project was very similar to what Aidan describes. We realised
thought that there won't be a single good way to build smaller dump that
would serve every conceivable use in research, which is why the UI
See also this recent discussion/brainstorm on "Wikidata subsetting"
https://docs.google.com/document/d/1MmrpEQ9O7xA6frNk6gceu_IbQrUiEYGI9vcQjDvTL9c/edit#heading=h.7xg3cywpkgfq
In a geographical context, whether or not an item has a Wikipedia entry
has been contemplated as a criterion for
It certainly helps, however, I think Aidan's suggestion goes into the
direction of having an official dump distribution.
Imagine how many CO2 can be spared just by avoiding the computational
resource to recreate this dump every time ones need it.
Besides, it standardise the dataset used for
Hi everyone,
Benno (in CC) has recently announced this tool:
https://tools.wmflabs.org/wdumps/
I haven't checked it out yet, but it sounds related to Aidan's inquiry.
Hope this helps.
Cheers,
Marco
On 12/18/19 8:01 AM, Edgard Marx wrote:
+1
On Tue, Dec 17, 2019, 19:14 Aidan Hogan
+1
On Tue, Dec 17, 2019, 19:14 Aidan Hogan wrote:
> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
> due to the
Hey all,
As someone who likes to use Wikidata in their research, and likes to
give students projects relating to Wikidata, I am finding it more and
more difficult to (recommend to) work with recent versions of Wikidata
due to the increasing dump sizes, where even the truthy version now
costs
15 matches
Mail list logo