[Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
Roan: The data for Echo schema(https://meta.wikimedia.org/wiki/Schema:Echo) is quite large and we are not sure is even used. Can you confirm either way? If it is no longer used we will stop collecting it. Thanks, Nuria ___ Analytics mailing list

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
>maybe we could nuke data that's more than a year old (or 6 months old or something) from mysql? With eventlogging data we "normally" drop data that is older than 90 days, will this work? Thanks for the prompt response. On Tue, Dec 15, 2015 at 11:27 AM, Roan Kattouw

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Madhumitha Viswanathan
I want to mention that data in Hadoop is only available from Aug 27th 2015. Older data is only available in mysql. On Tue, Dec 15, 2015 at 11:27 AM, Roan Kattouw wrote: > If the data is going to be retained but would just become harder to query > (i.e. still in Hadoop

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
>We could blacklist this schema from the mysql database, and still keep producing it. It would be available in Hadoop either way. Right but I would also like to drop the table if it is not being used, if data is not going to be looked at soonish there is no point in storing as it will likely be

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Roan Kattouw
If the data is going to be retained but would just become harder to query (i.e. still in Hadoop but not in mysql), maybe we could nuke data that's more than a year old (or 6 months old or something) from mysql? On Tue, Dec 15, 2015 at 9:35 AM, Andrew Otto wrote: > We could

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Jonathan Morgan
Hi Nuria! Speaking for *my own particular scenario*, that solution sounds like it will be fine, since I don't plan on immediately performing research with these data. But it's obviously still the Collab team's call here--they likely have needs I know nothing about. Cc'ing Joe Matazzoni in case

Re: [Analytics] How many times has a video been played?

2015-12-15 Thread Federico Leva (Nemo)
Dan Andreescu, 15/12/2015 03:43: Or python if that's easier. https://github.com/hay/wiki-tools/blob/master/etc/mediacounts-stats.py is very easy to use. Download from dumps.wikimedia.org is tragically slow, making any one-time analysis impractical, but /data/scratch/tmp/mediacounts on Labs

Re: [Analytics] How many times has a video been played?

2015-12-15 Thread Dan Andreescu
> > Download from dumps.wikimedia.org is tragically slow, making any one-time > analysis impractical, but /data/scratch/tmp/mediacounts on Labs has a copy > of October data. Nemo, that's really good information, thank you. I'm going to ask a hypothetical and I haven't done my due diligence yet.

Re: [Analytics] Page view API questions regarding user agent

2015-12-15 Thread Oliver Keyes
2-3 weeks? What are you doing, taking /vacations at Christmas/? Unacceptable! More seriously: the work on the API thus far - the data that has been moved in, the responsiveness around bug reports, the intuitive nature of the interface from a client library POV - has been fantastic. I hope you all

[Analytics] pageviews.js—A JavaScript Client Library for the Wikimedia Pageviews API for Node.js and the browser

2015-12-15 Thread Thomas Steiner
Dear all, First and foremost, thanks for making the Wikimedia Pageviews API available; your work is highly appreciated and super useful! As a modest "thank you", I am happy to release the JavaScript client library pageviews.js for Node.js and the browser to make working with this API easy for

Re: [Analytics] Page view API questions regarding user agent

2015-12-15 Thread Felix J. Scholz
I could not agree more. The API implementation has progressed remarkably well over the last few months. Congrats to all involved! On Tue, Dec 15, 2015 at 5:56 AM, Oliver Keyes wrote: > 2-3 weeks? What are you doing, taking /vacations at Christmas/? > Unacceptable! > >

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Jonathan Morgan
Hi Nuria, FWIW: Although I'm not using this right now, but I could see it being useful for understanding the impact of new notification updates that are coming down the pike.[1][2] What are the costs involved in keeping this schema up? Best, J 1.

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Aaron Halfaker
No! Please do not nuke old data. +1 to J-Mo. This will probably be useful for long-term studies of notifications. If I had the time, I'd pick it up right now based on this reminder! I'm happy with having historical data preserved (please makes sure that it is) and the MySQL table dropped

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Andrew Otto
We could blacklist this schema from the mysql database, and still keep producing it. It would be available in Hadoop either way. > On Dec 15, 2015, at 12:22, Jonathan Morgan wrote: > > Hi Nuria, > > FWIW: Although I'm not using this right now, but I could see it being

Re: [Analytics] Page view API questions regarding user agent

2015-12-15 Thread Dario Taraborelli
On Tue, Dec 15, 2015 at 2:56 AM, Oliver Keyes wrote: > 2-3 weeks? What are you doing, taking /vacations at Christmas/? > Unacceptable! > > More seriously: the work on the API thus far - the data that has been > moved in, the responsiveness around bug reports, the intuitive

Re: [Analytics] Announcing the pageview API

2015-12-15 Thread Andrew Otto
Nice job everybod! > On Dec 14, 2015, at 16:54, Kevin Leduc wrote: > > Hi All, > > It's official: we have a pageview API. You can read more about it on > Wikipedia's blog > http://blog.wikimedia.org/2015/12/14/pageview-data-easily-accessible/ >