>Will there be a release for these two tables?
No, sorry, there will not be. The dataset release is about pages and users.
To be extra clear though, it is not tables but a denormalized
reconstruction of the edit history.
> Could I connect to the Hadoop to see if the queries on pagelinks and
Hello Nuria,
This seems like an interesting alternative for some data (page, users,
revision). It can really help and make some processes faster (at the moment
we gave up running again the revision, as the new user_agent change made it
also slower). So we will take a look at it as soon as it is
Hello,
>From your description seems that your problem is not one of computation
(well, your main problem) but rather data extraction. The labs replicas
are not meant for big data extraction jobs as you have just found out.
Neither is Hadoop. Now, our team will be releasing soon a dataset of edit
To whom it might concern,
I am writing in regards of the project *Cultural Diversity Observatory* and
the data we are collecting. In short, this project aims at bridging the
content gaps between language editions that relate to cultural and
geographical aspects. For this we need to retrieve data
Forwarding a quick question from Peter so we can answer it publicly or take
advantage of work others have done:
[Can we] estimate how many visitors visit pages with equations (i.e.,
wikitext math tags)?
When we're talking about "how many visitors" we're talking about our Unique
Devices data