Apologies!  I realized it was Christmas Eve but I by no means meant to rush
this conversation.  Take as long as you like to answer to the thread and
enjoy your holidays everyone :)  I'll poke the thread again after the New
Year.  Happy Holidays!

On Thu, Dec 24, 2015 at 9:21 AM, Erik Zachte <ezac...@wikimedia.org> wrote:

> Dan, thanks for raising the issue (a bit less for raising it on X-mas eve
> ;-) (just kidding, mostly)
>
>
>
> Frankly I don't see much use for the earlier releases at all. The newest
> version had been kept very much downward compatible, migration of clients
> should be a no-brainer (mostly switching download url). Upgrading those
> same clients to also use the new additional counts is bit more work as the
> coding scheme is tedious (as a result of that downward compatability). But
> that upgrading could be done later.
>
>
>
> I propose to deprecate both earlier sets, and set an end date for updating
> those, e.g. July 1, and publish that widely, and offer support with
> migration. If people feel otherwise please chime in. Keeping the existing
> files is another matter, we should do so of course.
>
>
>
> About my aggregation datasets, it's just that: an aggregation of hourly
> files into daily and monthly aggregates, with extreme compression while
> retaining hourly precision, and adjusting for missing data (by
> extrapolation). These files are ideal for batch processes and lean
> downloads, and archiving for the longer haul.
>
>
>
> Reworking the datasets, in whatever way, with categories as part of the
> scheme sounds like a major overhaul, not like cleaning up old stuff.
> Exciting, but best to be done under a separate flag.
>
>
>
> Cheers,
>
> Erik
>
>
>
>
>
>
>
> *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On
> Behalf Of *Maurice Vergeer
> *Sent:* Thursday, December 24, 2015 15:12
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] [Pageviews] [Technical] Simplifying the
> available static dumps of pageview data
>
>
>
> Dear all,
>
> As I just mentioned to Dan in a private email conversation, keeping
> datasets even with imperfect measurements is important. Particularly for
> longitudinal analysis.
>
> Also, from what I understand - me being a newby here - is that the data
> are stored in separate files. Dan suggested reordering the page into
> categories. Maybe, another option is to create more extensive datasets with
> more different measurements in a single datafile. On the other hand, the
> files would become even bigger in size. Not an issue for mee, but for users
> in the field accesibility (dowlnload bandwidth) could become an issue.
>
> my two cents
>
> Maurice
>
>
>
>
>
> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.d...@gmail.com> wrote:
>
> Nothing against this approach!
>
>
>
> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu <dandree...@wikimedia.org>
> wrote:
>
>
>
>
>
> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.d...@gmail.com> wrote:
>
> Hi Dan,
>
> Happy holidays!
>
> Good idea to combine these datasets! However we have one more dataset by
> Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/
>
>
>
> And that's an important one!  But I was thinking we could re-organize the
> page into categories.  Erik's dataset could go into a "processed data"
> category or something like that.  The three I wanted to talk about on this
> thread are just the raw data.
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
>
> --
>
> Thank you.
>
> Alex Druk
> alex.d...@gmail.com
> (775) 237-8550 Google voice
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
>
> ________________________________________________
> Maurice Vergeer
> To contact me, see http://mauricevergeer.nl/node/5
> To see my publications, see http://mauricevergeer.nl/node/1
> ________________________________________________
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to