Thanks Gerhard, I will be touching base off-list.

I am looking for those json dumps precisely. We have been developing a toolkit that can process them in 12 hours (at least for the tests I have done with 2020 dumps). I will be happy to share more information with you (or anyone who is interested).

Best,

Daniel

On 11/25/2020 7:13 AM, Gerhard Gonter wrote:
On Wed, Nov 25, 2020 at 1:22 PM Daniel Garijo <dgar...@isi.edu> wrote:
Hello,

I am writing this message because I am analyzing the Wikidata JSON dumps
available in the Internet Archive and I have found there are no dumps
available after Feb 8th, 2019 (see
https://archive.org/details/wikimediadownloads?and%5B%5D=%22Wikidata%20entity%20dumps%22).
I know the latest dumps are available at
https://dumps.wikimedia.org/wikidatawiki/entities/, but unfortunately
they only cover the last few months.
Which dump files are exactly looking for?  Dumps like

https://dumps.wikimedia.org/wikidatawiki/entities/20201116/wikidata-20201116-all.json.gz

which can also be found on https://dumps.wikimedia.org/other/wikidata/
as 20201116.json.gz ?

[...]
Does anyone on this list know where some of these missing Wikidata dumps
may be found? If anyone has pointers to a server where they can be
downloaded, I would highly appreciate it.
If you are looking for these dumps, I have about 8 TB stored on
external disks.  Transferring these over the network might be
difficult, however.  Please contact me off-list, if this you need any
of these dumps, maybe we can arrange something.

I'm curious, what are you trying to do with all of these files?
Processing all of them must take months.  My processor usually picks
up the dump on Wednesday and takes 80 hours to comb through it.  But
my processor is written in Perl, something in C or Rust might be a lot
faster...

regards, Gerhard Gonter

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to