Re: [Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-08 Thread Nuria Ruiz
>Will there be a release for these two tables? No, sorry, there will not be. The dataset release is about pages and users. To be extra clear though, it is not tables but a denormalized reconstruction of the edit history. > Could I connect to the Hadoop to see if the queries on pagelinks and

Re: [Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-08 Thread Marc Miquel
Hello Nuria, This seems like an interesting alternative for some data (page, users, revision). It can really help and make some processes faster (at the moment we gave up running again the revision, as the new user_agent change made it also slower). So we will take a look at it as soon as it is

Re: [Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-08 Thread Nuria Ruiz
Hello, >From your description seems that your problem is not one of computation (well, your main problem) but rather data extraction. The labs replicas are not meant for big data extraction jobs as you have just found out. Neither is Hadoop. Now, our team will be releasing soon a dataset of edit

[Analytics] project Cultural Diversity Observatory / accessing analytics hadoop databases

2019-07-08 Thread Marc Miquel
To whom it might concern, I am writing in regards of the project *Cultural Diversity Observatory* and the data we are collecting. In short, this project aims at bridging the content gaps between language editions that relate to cultural and geographical aspects. For this we need to retrieve data

[Analytics] Pageviews and unique devices to a specific set of pages

2019-07-08 Thread Dan Andreescu
Forwarding a quick question from Peter so we can answer it publicly or take advantage of work others have done: [Can we] estimate how many visitors visit pages with equations (i.e., wikitext math tags)? When we're talking about "how many visitors" we're talking about our Unique Devices data