eddine Turki
>>> --
>>> *De :* Analytics de la part de
>>> Houcemeddine A. Turki
>>> *Envoyé :* mardi 9 juillet 2019 16:12
>>> *À :* A mailing list for the Analytics Team at WMF and everybody who
>>> has an interest
-
>> *De :* Analytics de la part de
>> Houcemeddine A. Turki
>> *Envoyé :* mardi 9 juillet 2019 16:12
>> *À :* A mailing list for the Analytics Team at WMF and everybody who has
>> an interest in Wikipedia and analytics.
>> *Objet :* Re: [Analytics] pr
019 16:12
> *À :* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Objet :* Re: [Analytics] project Cultural Diversity Observatory /
> accessing analytics hadoop databases
>
> Dear Mr.,
> I thank you f
list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Objet : Re: [Analytics] project Cultural Diversity Observatory / accessing
analytics hadoop databases
Dear Mr.,
I thank you for your efforts. When we were in WikiIndaba 2018, it was
interesting
À : A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Objet : Re: [Analytics] project Cultural Diversity Observatory / accessing
analytics hadoop databases
Marc:
>We'd like to start the formal process to have an active collaborat
Marc:
>We'd like to start the formal process to have an active collaboration, as
it seems there is no other solution available
Given that formal collaborations are somewhat hard to obtain (research team
has so many resources) my recommendation would be to import the public
data into other
Thanks for your clarification Nuria.
The categorylinks table is working better lately. Computing counts at the
pagelinks table is critical. I'm afraid there is no solution for this one.
I thought about creating a temporary table pagelinks with data from the
dumps for each language edition. But
>Will there be a release for these two tables?
No, sorry, there will not be. The dataset release is about pages and users.
To be extra clear though, it is not tables but a denormalized
reconstruction of the edit history.
> Could I connect to the Hadoop to see if the queries on pagelinks and
Hello Nuria,
This seems like an interesting alternative for some data (page, users,
revision). It can really help and make some processes faster (at the moment
we gave up running again the revision, as the new user_agent change made it
also slower). So we will take a look at it as soon as it is
Hello,
>From your description seems that your problem is not one of computation
(well, your main problem) but rather data extraction. The labs replicas
are not meant for big data extraction jobs as you have just found out.
Neither is Hadoop. Now, our team will be releasing soon a dataset of edit
10 matches
Mail list logo