Re: [Analytics] Requesting access to Wikimedia Pageview Dumps for Research

2016-03-02 Thread Bo Han
Hi, I noticed the maintenance email was announced at https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-March/001262.html but it'd be helpful to CC this list as well. Bo On Wed, Mar 2, 2016 at 11:26 AM, Toby Negrin wrote: > I believe the dumps server was

Re: [Analytics] Requesting access to Wikimedia Pageview Dumps for Research

2016-03-02 Thread Nuria Ruiz
cc-ing Analytics list and Ariel who maintains dumps. On Wed, Mar 2, 2016 at 8:31 AM, Gonzalo Diaz wrote: > Dear Nuria Ruiz, > > My name is Gonzalo Diaz, and I am a PhD student of Computer Science at the > University of Oxford. You can see my profile here: >

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Dan Andreescu
Ok, Schema_talk page updated and task filed with Jaime cc-ed (he's the one that has the permits to do this): https://meta.wikimedia.org/wiki/Schema_talk:Echo I was about to say I'm cc-ing the analytics list when I see that, apparently, Roan's email address is analytics@lists.wikimedia.org. Huh?

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Dan Andreescu
K, I'll delete Schema:Edit:) just kiddingOk so we will just set the policy for Schema:Echo to purge after 90 days, so the data will delete itself and give yall time to do any last queries you might want.

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
After meeting with the team: - Encoding issue was due to locale wrongly set on some machines (but we don't know why) - We will find a way to enforce file.encoding, first looking for a java-global way, if not feasible, a process-local way. - We will NOT spend computing resource on a job trying

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Roan Kattouw
On Wed, Mar 2, 2016 at 9:34 AM, Neil P. Quinn wrote: > *Schema:Edit contains no useful information that isn't already in the >> database apart from which button people use to thank each other,* > > > I assume you mean Schema:Echo? :) > YES. Yes. ECHO, not Edit. I saw

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Neil P. Quinn
> > *Schema:Edit contains no useful information that isn't already in the > database apart from which button people use to thank each other,* I assume you mean Schema:Echo? :) On Tue, Mar 1, 2016 at 11:58 PM, Roan Kattouw wrote: > [Reviving old thread] > > I was

Re: [Analytics] [Ops] Dark traffic

2016-03-02 Thread John Mark Vandenberg
On Tue, Mar 1, 2016 at 10:49 PM, Dario Taraborelli wrote: > This change should fix this, while preserving the privacy of our readers > browsing content over HTTPS. That depends greatly on what you mean by readers privacy. By definition referrers violate the privacy

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Ori Livneh
So: what is the planning for making sure this doesn't happen the next time around? :) On Tue, Mar 1, 2016 at 5:26 AM, Joseph Allemandou wrote: > Hi, > > *TL,DR: Please don't use hive / spark / hadoop before next week.* > > Last week the Analytics Team performed an

Re: [Analytics] [Engineering] Hadoop - Last week data needs to be backfilled

2016-03-02 Thread Joseph Allemandou
Hi Tilman, Your assumption is correct, you can trust projectview_hourly :) On Wed, Mar 2, 2016 at 4:22 AM, Tilman Bayer wrote: > Thanks Joseph! Is it reasonable to assume that the aggregate data in > projectview_hourly >