Oliver: Two months is fine. Thank you so much for your help! > On Apr 13, 2015, at 4:40 PM, analytics-requ...@lists.wikimedia.org wrote: > > Send Analytics mailing list submissions to > analytics@lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > analytics-requ...@lists.wikimedia.org > > You can reach the person managing the list at > analytics-ow...@lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Analytics digest..." > > > Today's Topics: > > 1. Re: Page views on a more frequent than hourly basis (Pine W) > 2. Re: Page views on a more frequent than hourly basis (Hirav Gandhi) > 3. Re: Page views on a more frequent than hourly basis (Oliver Keyes) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 13 Apr 2015 13:34:23 -0700 > From: Pine W <wiki.p...@gmail.com> > To: "A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics." > <analytics@lists.wikimedia.org> > Subject: Re: [Analytics] Page views on a more frequent than hourly > basis > Message-ID: > <CAF=dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi Oliver, re ccing people who are on list, this is the protocol we > followed in IEGCom to ping people who are subscribed and mentioned in > certain emails but, like many of us, may automatically move emails from > lists directly to folders where they may be unread for days. So there is a > reason to do this. > > Thanks, > > Pine > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Mon, 13 Apr 2015 16:30:43 -0700 > From: Hirav Gandhi <hirav.gan...@gmail.com> > To: analytics@lists.wikimedia.org > Subject: Re: [Analytics] Page views on a more frequent than hourly > basis > Message-ID: > <CANzC_EOvi4MP7G_SsxvW=uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Thanks Oliver! > > We would like this data for as broad of a time period as you can muster. > The more days, months and year represented in the dataset, the better. > > >> Okay, so: >> >> I took an hour from the pageviews logs,[0] and aggregated pageviews to >> enwiki (mobile and desktop both) by timestamp, down to one-second >> resolution levels. The lowest number of pageviews to enwiki per second >> was 2,981 >> >> So, I don't personally have a problem with generating a release of: >> >> 1. Pageviews per second; >> 2. To enwiki; >> 3. Over $TIME_PERIOD; >> 4. grouping the mobile and desktop site >> >> But Dario or someone should chip in before I touch anything ;p >> >> 6am yesterday. 6am because it should be low-traffic, right? At least >> given our biases towards north america and europe >> >> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org> wrote: >>> Then that sounds much more viable. I'll run a quick test now to see >>> how much clustering we'd see at, say, the one-second resolution level, >>> and throw it out here so we can make more informed decisions about a >>> data release on this. >>> >>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gan...@gmail.com> wrote: >>>> Hi Oliver, >>>> >>>> Re: Hirav: would you be looking for temporally /and/ contextually >> granular >>>> pageviews, i.e. "a view to X page at Y time", or just temporally >> granular, >>>> so "a view to a page on enwiki at X time"? If the latter you've got >> more of >>>> a shot, I suspect. >>>> >>>> I only want the latter - I am not concerned with the context so much as >> just >>>> “a view to a page on enwiki at X time.” >>>> >>>> Hirav >>>> >>>> >>>> On Apr 13, 2015, at 5:00 AM, analytics-requ...@lists.wikimedia.org >> wrote: >>>> >>>> Send Analytics mailing list submissions to >>>> analytics@lists.wikimedia.org >>>> >>>> To subscribe or unsubscribe via the World Wide Web, visit >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> or, via email, send a message with subject or body 'help' to >>>> analytics-requ...@lists.wikimedia.org >>>> >>>> You can reach the person managing the list at >>>> analytics-ow...@lists.wikimedia.org >>>> >>>> When replying, please edit your Subject line so it is more specific >>>> than "Re: Contents of Analytics digest..." >>>> >>>> >>>> Today's Topics: >>>> >>>> 1. Re: Page views on a more frequent than hourly basis (Pine W) >>>> 2. Re: Page views on a more frequent than hourly basis (Oliver Keyes) >>>> >>>> >>>> ---------------------------------------------------------------------- >>>> >>>> Message: 1 >>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >>>> From: Pine W <wiki.p...@gmail.com> >>>> To: "A mailing list for the Analytics Team at WMF and everybody who >>>> has an interest in Wikipedia and analytics." >>>> <analytics@lists.wikimedia.org> >>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> >>>> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>> basis >>>> Message-ID: >>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >>>> Content-Type: text/plain; charset="utf-8" >>>> >>>> >>>> Hi, >>>> >>>> This issue of pageview data granularity has been discussed before, and >> the >>>> answer has been that hourly is the smallest increment allowed to be >>>> revealed publicly, for privacy reasons. >>>> >>>> I believe that the person you will want to discuss your request with is >>>> Toby, who I have cc'd here. >>>> >>>> Pine >>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com> >> wrote: >>>> >>>> Hi Wikimedia Analytics Team, >>>> >>>> My colleague Bharath and I are doing research on dynamic server >> allocation >>>> algorithms and we were looking for a suitable datasets to test our >>>> predictive algorithm on. We noticed that Wikimedia has an amazing data >> set >>>> of hourly page views, but we were looking for something a bit more >>>> granular, such as aggregated page requests to English Wikipedia on a >> minute >>>> by minute basis or second by second basis if possible. >>>> >>>> We are more than happy to pour through any raw data you might have that >>>> would help us calculate page requests at this granular level. Please >> let us >>>> know if it would be possible to get such data and if so how. Thank you >> in >>>> advance for your help. >>>> >>>> Best, >>>> >>>> Hirav Gandhi >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> -------------- next part -------------- >>>> An HTML attachment was scrubbed... >>>> URL: >>>> < >> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html >>> >>>> >>>> ------------------------------ >>>> >>>> Message: 2 >>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >>>> From: Oliver Keyes <oke...@wikimedia.org> >>>> To: "A mailing list for the Analytics Team at WMF and everybody who >>>> has an interest in Wikipedia and analytics." >>>> <analytics@lists.wikimedia.org> >>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> >>>> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>> basis >>>> Message-ID: >>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >>>> Content-Type: text/plain; charset=UTF-8 >>>> >>>> >>>> Preeetty sure that Toby is on the analytics list, Pine. He's the >>>> director of analytics. >>>> >>>> Hirav: would you be looking for temporally /and/ contextually granular >>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>> granular, so "a view to a page on enwiki at X time"? If the latter >>>> you've got more of a shot, I suspect. >>>> >>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> This issue of pageview data granularity has been discussed before, and >> the >>>> answer has been that hourly is the smallest increment allowed to be >> revealed >>>> publicly, for privacy reasons. >>>> >>>> I believe that the person you will want to discuss your request with is >>>> Toby, who I have cc'd here. >>>> >>>> Pine >>>> >>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com> >> wrote: >>>> >>>> >>>> Hi Wikimedia Analytics Team, >>>> >>>> My colleague Bharath and I are doing research on dynamic server >> allocation >>>> algorithms and we were looking for a suitable datasets to test our >>>> predictive algorithm on. We noticed that Wikimedia has an amazing data >> set >>>> of hourly page views, but we were looking for something a bit more >> granular, >>>> such as aggregated page requests to English Wikipedia on a minute by >> minute >>>> basis or second by second basis if possible. >>>> >>>> We are more than happy to pour through any raw data you might have that >>>> would help us calculate page requests at this granular level. Please >> let us >>>> know if it would be possible to get such data and if so how. Thank you >> in >>>> advance for your help. >>>> >>>> Best, >>>> >>>> Hirav Gandhi >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>>> >>>> >>>> -- >>>> Oliver Keyes >>>> Research Analyst >>>> Wikimedia Foundation >>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>>> End of Analytics Digest, Vol 38, Issue 21 >>>> ***************************************** >>>> >>>> >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >> >> >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation >> >> >> >> ------------------------------ >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Mon, 13 Apr 2015 19:40:04 -0400 > From: Oliver Keyes <oke...@wikimedia.org> > To: "A mailing list for the Analytics Team at WMF and everybody who > has an interest in Wikipedia and analytics." > <analytics@lists.wikimedia.org> > Subject: Re: [Analytics] Page views on a more frequent than hourly > basis > Message-ID: > <caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > .... > > > ...years? > > We have unsampled logs for, ah. 2 months. > > On 13 April 2015 at 19:30, Hirav Gandhi <hirav.gan...@gmail.com> wrote: >> Thanks Oliver! >> >> We would like this data for as broad of a time period as you can muster. The >> more days, months and year represented in the dataset, the better. >> >>> >>> Okay, so: >>> >>> I took an hour from the pageviews logs,[0] and aggregated pageviews to >>> enwiki (mobile and desktop both) by timestamp, down to one-second >>> resolution levels. The lowest number of pageviews to enwiki per second >>> was 2,981 >>> >>> So, I don't personally have a problem with generating a release of: >>> >>> 1. Pageviews per second; >>> 2. To enwiki; >>> 3. Over $TIME_PERIOD; >>> 4. grouping the mobile and desktop site >>> >>> But Dario or someone should chip in before I touch anything ;p >>> >>> 6am yesterday. 6am because it should be low-traffic, right? At least >>> given our biases towards north america and europe >>> >>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org> wrote: >>>> Then that sounds much more viable. I'll run a quick test now to see >>>> how much clustering we'd see at, say, the one-second resolution level, >>>> and throw it out here so we can make more informed decisions about a >>>> data release on this. >>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gan...@gmail.com> wrote: >>>>> Hi Oliver, >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/ contextually >>>>> granular >>>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>>> granular, >>>>> so "a view to a page on enwiki at X time"? If the latter you've got >>>>> more of >>>>> a shot, I suspect. >>>>> >>>>> I only want the latter - I am not concerned with the context so much as >>>>> just >>>>> “a view to a page on enwiki at X time.” >>>>> >>>>> Hirav >>>>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM, analytics-requ...@lists.wikimedia.org >>>>> wrote: >>>>> >>>>> Send Analytics mailing list submissions to >>>>> analytics@lists.wikimedia.org >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> or, via email, send a message with subject or body 'help' to >>>>> analytics-requ...@lists.wikimedia.org >>>>> >>>>> You can reach the person managing the list at >>>>> analytics-ow...@lists.wikimedia.org >>>>> >>>>> When replying, please edit your Subject line so it is more specific >>>>> than "Re: Contents of Analytics digest..." >>>>> >>>>> >>>>> Today's Topics: >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis (Pine W) >>>>> 2. Re: Page views on a more frequent than hourly basis (Oliver Keyes) >>>>> >>>>> >>>>> ---------------------------------------------------------------------- >>>>> >>>>> Message: 1 >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >>>>> From: Pine W <wiki.p...@gmail.com> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody who >>>>> has an interest in Wikipedia and analytics." >>>>> <analytics@lists.wikimedia.org> >>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> >>>>> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>>> basis >>>>> Message-ID: >>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >>>>> Content-Type: text/plain; charset="utf-8" >>>>> >>>>> >>>>> Hi, >>>>> >>>>> This issue of pageview data granularity has been discussed before, and >>>>> the >>>>> answer has been that hourly is the smallest increment allowed to be >>>>> revealed publicly, for privacy reasons. >>>>> >>>>> I believe that the person you will want to discuss your request with is >>>>> Toby, who I have cc'd here. >>>>> >>>>> Pine >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi Wikimedia Analytics Team, >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server >>>>> allocation >>>>> algorithms and we were looking for a suitable datasets to test our >>>>> predictive algorithm on. We noticed that Wikimedia has an amazing data >>>>> set >>>>> of hourly page views, but we were looking for something a bit more >>>>> granular, such as aggregated page requests to English Wikipedia on a >>>>> minute >>>>> by minute basis or second by second basis if possible. >>>>> >>>>> We are more than happy to pour through any raw data you might have that >>>>> would help us calculate page requests at this granular level. Please >>>>> let us >>>>> know if it would be possible to get such data and if so how. Thank you >>>>> in >>>>> advance for your help. >>>>> >>>>> Best, >>>>> >>>>> Hirav Gandhi >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> -------------- next part -------------- >>>>> An HTML attachment was scrubbed... >>>>> URL: >>>>> >>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html> >>>>> >>>>> ------------------------------ >>>>> >>>>> Message: 2 >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >>>>> From: Oliver Keyes <oke...@wikimedia.org> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody who >>>>> has an interest in Wikipedia and analytics." >>>>> <analytics@lists.wikimedia.org> >>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> >>>>> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>>> basis >>>>> Message-ID: >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >>>>> Content-Type: text/plain; charset=UTF-8 >>>>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the >>>>> director of analytics. >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually granular >>>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>>> granular, so "a view to a page on enwiki at X time"? If the latter >>>>> you've got more of a shot, I suspect. >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> This issue of pageview data granularity has been discussed before, and >>>>> the >>>>> answer has been that hourly is the smallest increment allowed to be >>>>> revealed >>>>> publicly, for privacy reasons. >>>>> >>>>> I believe that the person you will want to discuss your request with is >>>>> Toby, who I have cc'd here. >>>>> >>>>> Pine >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> Hi Wikimedia Analytics Team, >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server >>>>> allocation >>>>> algorithms and we were looking for a suitable datasets to test our >>>>> predictive algorithm on. We noticed that Wikimedia has an amazing data >>>>> set >>>>> of hourly page views, but we were looking for something a bit more >>>>> granular, >>>>> such as aggregated page requests to English Wikipedia on a minute by >>>>> minute >>>>> basis or second by second basis if possible. >>>>> >>>>> We are more than happy to pour through any raw data you might have that >>>>> would help us calculate page requests at this granular level. Please >>>>> let us >>>>> know if it would be possible to get such data and if so how. Thank you >>>>> in >>>>> advance for your help. >>>>> >>>>> Best, >>>>> >>>>> Hirav Gandhi >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Oliver Keyes >>>>> Research Analyst >>>>> Wikimedia Foundation >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21 >>>>> ***************************************** >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> >>>> >>>> >>>> -- >>>> Oliver Keyes >>>> Research Analyst >>>> Wikimedia Foundation >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > > > ------------------------------ > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > End of Analytics Digest, Vol 38, Issue 24 > *****************************************
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics