Apologies! I realized it was Christmas Eve but I by no means meant to rush this conversation. Take as long as you like to answer to the thread and enjoy your holidays everyone :) I'll poke the thread again after the New Year. Happy Holidays!
On Thu, Dec 24, 2015 at 9:21 AM, Erik Zachte <ezac...@wikimedia.org> wrote: > Dan, thanks for raising the issue (a bit less for raising it on X-mas eve > ;-) (just kidding, mostly) > > > > Frankly I don't see much use for the earlier releases at all. The newest > version had been kept very much downward compatible, migration of clients > should be a no-brainer (mostly switching download url). Upgrading those > same clients to also use the new additional counts is bit more work as the > coding scheme is tedious (as a result of that downward compatability). But > that upgrading could be done later. > > > > I propose to deprecate both earlier sets, and set an end date for updating > those, e.g. July 1, and publish that widely, and offer support with > migration. If people feel otherwise please chime in. Keeping the existing > files is another matter, we should do so of course. > > > > About my aggregation datasets, it's just that: an aggregation of hourly > files into daily and monthly aggregates, with extreme compression while > retaining hourly precision, and adjusting for missing data (by > extrapolation). These files are ideal for batch processes and lean > downloads, and archiving for the longer haul. > > > > Reworking the datasets, in whatever way, with categories as part of the > scheme sounds like a major overhaul, not like cleaning up old stuff. > Exciting, but best to be done under a separate flag. > > > > Cheers, > > Erik > > > > > > > > *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On > Behalf Of *Maurice Vergeer > *Sent:* Thursday, December 24, 2015 15:12 > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject:* Re: [Analytics] [Pageviews] [Technical] Simplifying the > available static dumps of pageview data > > > > Dear all, > > As I just mentioned to Dan in a private email conversation, keeping > datasets even with imperfect measurements is important. Particularly for > longitudinal analysis. > > Also, from what I understand - me being a newby here - is that the data > are stored in separate files. Dan suggested reordering the page into > categories. Maybe, another option is to create more extensive datasets with > more different measurements in a single datafile. On the other hand, the > files would become even bigger in size. Not an issue for mee, but for users > in the field accesibility (dowlnload bandwidth) could become an issue. > > my two cents > > Maurice > > > > > > On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.d...@gmail.com> wrote: > > Nothing against this approach! > > > > On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu <dandree...@wikimedia.org> > wrote: > > > > > > On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.d...@gmail.com> wrote: > > Hi Dan, > > Happy holidays! > > Good idea to combine these datasets! However we have one more dataset by > Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/ > > > > And that's an important one! But I was thinking we could re-organize the > page into categories. Erik's dataset could go into a "processed data" > category or something like that. The three I wanted to talk about on this > thread are just the raw data. > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > > Thank you. > > Alex Druk > alex.d...@gmail.com > (775) 237-8550 Google voice > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > > ________________________________________________ > Maurice Vergeer > To contact me, see http://mauricevergeer.nl/node/5 > To see my publications, see http://mauricevergeer.nl/node/1 > ________________________________________________ > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics