Oliver: Two months is fine. Thank you so much for your help!

> On Apr 13, 2015, at 4:40 PM, analytics-requ...@lists.wikimedia.org wrote:
> 
> Send Analytics mailing list submissions to
>       analytics@lists.wikimedia.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
>       analytics-requ...@lists.wikimedia.org
> 
> You can reach the person managing the list at
>       analytics-ow...@lists.wikimedia.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Page views on a more frequent than hourly basis (Pine W)
>   2. Re: Page views on a more frequent than hourly basis (Hirav Gandhi)
>   3. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 13 Apr 2015 13:34:23 -0700
> From: Pine W <wiki.p...@gmail.com>
> To: "A mailing list for the Analytics Team at WMF and everybody who
>       has an  interest in Wikipedia and analytics."
>       <analytics@lists.wikimedia.org>
> Subject: Re: [Analytics] Page views on a more frequent than hourly
>       basis
> Message-ID:
>       <CAF=dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Oliver, re ccing people who are on list, this is the protocol we
> followed in IEGCom to ping people who are subscribed and mentioned in
> certain emails but, like many of us, may automatically move emails from
> lists directly to folders where they may be unread for days. So there is a
> reason to do this.
> 
> Thanks,
> 
> Pine
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 13 Apr 2015 16:30:43 -0700
> From: Hirav Gandhi <hirav.gan...@gmail.com>
> To: analytics@lists.wikimedia.org
> Subject: Re: [Analytics] Page views on a more frequent than hourly
>       basis
> Message-ID:
>       <CANzC_EOvi4MP7G_SsxvW=uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Thanks Oliver!
> 
> We would like this data for as broad of a time period as you can muster.
> The more days, months and year represented in the dataset, the better.
> 
> 
>> Okay, so:
>> 
>> I took an hour from the pageviews logs,[0] and aggregated pageviews to
>> enwiki (mobile and desktop both) by timestamp, down to one-second
>> resolution levels. The lowest number of pageviews to enwiki per second
>> was 2,981
>> 
>> So, I don't personally have a problem with generating a release of:
>> 
>> 1. Pageviews per second;
>> 2. To enwiki;
>> 3. Over $TIME_PERIOD;
>> 4. grouping the mobile and desktop site
>> 
>> But Dario or someone should chip in before I touch anything ;p
>> 
>> 6am yesterday. 6am because it should be low-traffic, right? At least
>> given our biases towards north america and europe
>> 
>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org> wrote:
>>> Then that sounds much more viable. I'll run a quick test now to see
>>> how much clustering we'd see at, say, the one-second resolution level,
>>> and throw it out here so we can make more informed decisions about a
>>> data release on this.
>>> 
>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gan...@gmail.com> wrote:
>>>> Hi Oliver,
>>>> 
>>>> Re: Hirav: would you be looking for temporally /and/ contextually
>> granular
>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>> granular,
>>>> so "a view to a page on enwiki at X time"? If the latter you've got
>> more of
>>>> a shot, I suspect.
>>>> 
>>>> I only want the latter - I am not concerned with the context so much as
>> just
>>>> “a view to a page on enwiki at X time.”
>>>> 
>>>> Hirav
>>>> 
>>>> 
>>>> On Apr 13, 2015, at 5:00 AM, analytics-requ...@lists.wikimedia.org
>> wrote:
>>>> 
>>>> Send Analytics mailing list submissions to
>>>> analytics@lists.wikimedia.org
>>>> 
>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> or, via email, send a message with subject or body 'help' to
>>>> analytics-requ...@lists.wikimedia.org
>>>> 
>>>> You can reach the person managing the list at
>>>> analytics-ow...@lists.wikimedia.org
>>>> 
>>>> When replying, please edit your Subject line so it is more specific
>>>> than "Re: Contents of Analytics digest..."
>>>> 
>>>> 
>>>> Today's Topics:
>>>> 
>>>>  1. Re: Page views on a more frequent than hourly basis (Pine W)
>>>>  2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
>>>> 
>>>> 
>>>> ----------------------------------------------------------------------
>>>> 
>>>> Message: 1
>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>>>> From: Pine W <wiki.p...@gmail.com>
>>>> To: "A mailing list for the Analytics Team at WMF and everybody who
>>>> has an interest in Wikipedia and analytics."
>>>> <analytics@lists.wikimedia.org>
>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
>>>> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>> basis
>>>> Message-ID:
>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com>
>>>> Content-Type: text/plain; charset="utf-8"
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> This issue of pageview data granularity has been discussed before, and
>> the
>>>> answer has been that hourly is the smallest increment allowed to be
>>>> revealed publicly, for privacy reasons.
>>>> 
>>>> I believe that the person you will want to discuss your request with is
>>>> Toby, who I have cc'd here.
>>>> 
>>>> Pine
>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Wikimedia Analytics Team,
>>>> 
>>>> My colleague Bharath and I are doing research on dynamic server
>> allocation
>>>> algorithms and we were looking for a suitable datasets to test our
>>>> predictive algorithm on. We noticed that Wikimedia has an amazing data
>> set
>>>> of hourly page views, but we were looking for something a bit more
>>>> granular, such as aggregated page requests to English Wikipedia on a
>> minute
>>>> by minute basis or second by second basis if possible.
>>>> 
>>>> We are more than happy to pour through any raw data you might have that
>>>> would help us calculate page requests at this granular level. Please
>> let us
>>>> know if it would be possible to get such data and if so how. Thank you
>> in
>>>> advance for your help.
>>>> 
>>>> Best,
>>>> 
>>>> Hirav Gandhi
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL:
>>>> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> Message: 2
>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>>>> From: Oliver Keyes <oke...@wikimedia.org>
>>>> To: "A mailing list for the Analytics Team at WMF and everybody who
>>>> has an interest in Wikipedia and analytics."
>>>> <analytics@lists.wikimedia.org>
>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
>>>> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>> basis
>>>> Message-ID:
>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com>
>>>> Content-Type: text/plain; charset=UTF-8
>>>> 
>>>> 
>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the
>>>> director of analytics.
>>>> 
>>>> Hirav: would you be looking for temporally /and/ contextually granular
>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>> granular, so "a view to a page on enwiki at X time"? If the latter
>>>> you've got more of a shot, I suspect.
>>>> 
>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> This issue of pageview data granularity has been discussed before, and
>> the
>>>> answer has been that hourly is the smallest increment allowed to be
>> revealed
>>>> publicly, for privacy reasons.
>>>> 
>>>> I believe that the person you will want to discuss your request with is
>>>> Toby, who I have cc'd here.
>>>> 
>>>> Pine
>>>> 
>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com>
>> wrote:
>>>> 
>>>> 
>>>> Hi Wikimedia Analytics Team,
>>>> 
>>>> My colleague Bharath and I are doing research on dynamic server
>> allocation
>>>> algorithms and we were looking for a suitable datasets to test our
>>>> predictive algorithm on. We noticed that Wikimedia has an amazing data
>> set
>>>> of hourly page views, but we were looking for something a bit more
>> granular,
>>>> such as aggregated page requests to English Wikipedia on a minute by
>> minute
>>>> basis or second by second basis if possible.
>>>> 
>>>> We are more than happy to pour through any raw data you might have that
>>>> would help us calculate page requests at this granular level. Please
>> let us
>>>> know if it would be possible to get such data and if so how. Thank you
>> in
>>>> advance for your help.
>>>> 
>>>> Best,
>>>> 
>>>> Hirav Gandhi
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>>> 
>>>> 
>>>> 
>>>> ------------------------------
>>>> 
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>>> 
>>>> End of Analytics Digest, Vol 38, Issue 21
>>>> *****************************************
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>> 
>> 
>> 
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>> 
>> 
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 13 Apr 2015 19:40:04 -0400
> From: Oliver Keyes <oke...@wikimedia.org>
> To: "A mailing list for the Analytics Team at WMF and everybody who
>       has an  interest in Wikipedia and analytics."
>       <analytics@lists.wikimedia.org>
> Subject: Re: [Analytics] Page views on a more frequent than hourly
>       basis
> Message-ID:
>       <caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> ....
> 
> 
> ...years?
> 
> We have unsampled logs for, ah. 2 months.
> 
> On 13 April 2015 at 19:30, Hirav Gandhi <hirav.gan...@gmail.com> wrote:
>> Thanks Oliver!
>> 
>> We would like this data for as broad of a time period as you can muster. The
>> more days, months and year represented in the dataset, the better.
>> 
>>> 
>>> Okay, so:
>>> 
>>> I took an hour from the pageviews logs,[0] and aggregated pageviews to
>>> enwiki (mobile and desktop both) by timestamp, down to one-second
>>> resolution levels. The lowest number of pageviews to enwiki per second
>>> was 2,981
>>> 
>>> So, I don't personally have a problem with generating a release of:
>>> 
>>> 1. Pageviews per second;
>>> 2. To enwiki;
>>> 3. Over $TIME_PERIOD;
>>> 4. grouping the mobile and desktop site
>>> 
>>> But Dario or someone should chip in before I touch anything ;p
>>> 
>>> 6am yesterday. 6am because it should be low-traffic, right? At least
>>> given our biases towards north america and europe
>>> 
>>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org> wrote:
>>>> Then that sounds much more viable. I'll run a quick test now to see
>>>> how much clustering we'd see at, say, the one-second resolution level,
>>>> and throw it out here so we can make more informed decisions about a
>>>> data release on this.
>>>> 
>>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gan...@gmail.com> wrote:
>>>>> Hi Oliver,
>>>>> 
>>>>> Re: Hirav: would you be looking for temporally /and/ contextually
>>>>> granular
>>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>>> granular,
>>>>> so "a view to a page on enwiki at X time"? If the latter you've got
>>>>> more of
>>>>> a shot, I suspect.
>>>>> 
>>>>> I only want the latter - I am not concerned with the context so much as
>>>>> just
>>>>> “a view to a page on enwiki at X time.”
>>>>> 
>>>>> Hirav
>>>>> 
>>>>> 
>>>>> On Apr 13, 2015, at 5:00 AM, analytics-requ...@lists.wikimedia.org
>>>>> wrote:
>>>>> 
>>>>> Send Analytics mailing list submissions to
>>>>> analytics@lists.wikimedia.org
>>>>> 
>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> or, via email, send a message with subject or body 'help' to
>>>>> analytics-requ...@lists.wikimedia.org
>>>>> 
>>>>> You can reach the person managing the list at
>>>>> analytics-ow...@lists.wikimedia.org
>>>>> 
>>>>> When replying, please edit your Subject line so it is more specific
>>>>> than "Re: Contents of Analytics digest..."
>>>>> 
>>>>> 
>>>>> Today's Topics:
>>>>> 
>>>>>  1. Re: Page views on a more frequent than hourly basis (Pine W)
>>>>>  2. Re: Page views on a more frequent than hourly basis (Oliver Keyes)
>>>>> 
>>>>> 
>>>>> ----------------------------------------------------------------------
>>>>> 
>>>>> Message: 1
>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>>>>> From: Pine W <wiki.p...@gmail.com>
>>>>> To: "A mailing list for the Analytics Team at WMF and everybody who
>>>>> has an interest in Wikipedia and analytics."
>>>>> <analytics@lists.wikimedia.org>
>>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
>>>>> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>>> basis
>>>>> Message-ID:
>>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com>
>>>>> Content-Type: text/plain; charset="utf-8"
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> This issue of pageview data granularity has been discussed before, and
>>>>> the
>>>>> answer has been that hourly is the smallest increment allowed to be
>>>>> revealed publicly, for privacy reasons.
>>>>> 
>>>>> I believe that the person you will want to discuss your request with is
>>>>> Toby, who I have cc'd here.
>>>>> 
>>>>> Pine
>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> Hi Wikimedia Analytics Team,
>>>>> 
>>>>> My colleague Bharath and I are doing research on dynamic server
>>>>> allocation
>>>>> algorithms and we were looking for a suitable datasets to test our
>>>>> predictive algorithm on. We noticed that Wikimedia has an amazing data
>>>>> set
>>>>> of hourly page views, but we were looking for something a bit more
>>>>> granular, such as aggregated page requests to English Wikipedia on a
>>>>> minute
>>>>> by minute basis or second by second basis if possible.
>>>>> 
>>>>> We are more than happy to pour through any raw data you might have that
>>>>> would help us calculate page requests at this granular level. Please
>>>>> let us
>>>>> know if it would be possible to get such data and if so how. Thank you
>>>>> in
>>>>> advance for your help.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Hirav Gandhi
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> 
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>> URL:
>>>>> 
>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html>
>>>>> 
>>>>> ------------------------------
>>>>> 
>>>>> Message: 2
>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>>>>> From: Oliver Keyes <oke...@wikimedia.org>
>>>>> To: "A mailing list for the Analytics Team at WMF and everybody who
>>>>> has an interest in Wikipedia and analytics."
>>>>> <analytics@lists.wikimedia.org>
>>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
>>>>> Subject: Re: [Analytics] Page views on a more frequent than hourly
>>>>> basis
>>>>> Message-ID:
>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com>
>>>>> Content-Type: text/plain; charset=UTF-8
>>>>> 
>>>>> 
>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the
>>>>> director of analytics.
>>>>> 
>>>>> Hirav: would you be looking for temporally /and/ contextually granular
>>>>> pageviews, i.e. "a view to X page at Y time", or just temporally
>>>>> granular, so "a view to a page on enwiki at X time"? If the latter
>>>>> you've got more of a shot, I suspect.
>>>>> 
>>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> This issue of pageview data granularity has been discussed before, and
>>>>> the
>>>>> answer has been that hourly is the smallest increment allowed to be
>>>>> revealed
>>>>> publicly, for privacy reasons.
>>>>> 
>>>>> I believe that the person you will want to discuss your request with is
>>>>> Toby, who I have cc'd here.
>>>>> 
>>>>> Pine
>>>>> 
>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <hirav.gan...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> Hi Wikimedia Analytics Team,
>>>>> 
>>>>> My colleague Bharath and I are doing research on dynamic server
>>>>> allocation
>>>>> algorithms and we were looking for a suitable datasets to test our
>>>>> predictive algorithm on. We noticed that Wikimedia has an amazing data
>>>>> set
>>>>> of hourly page views, but we were looking for something a bit more
>>>>> granular,
>>>>> such as aggregated page requests to English Wikipedia on a minute by
>>>>> minute
>>>>> basis or second by second basis if possible.
>>>>> 
>>>>> We are more than happy to pour through any raw data you might have that
>>>>> would help us calculate page requests at this granular level. Please
>>>>> let us
>>>>> know if it would be possible to get such data and if so how. Thank you
>>>>> in
>>>>> advance for your help.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Hirav Gandhi
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Oliver Keyes
>>>>> Research Analyst
>>>>> Wikimedia Foundation
>>>>> 
>>>>> 
>>>>> 
>>>>> ------------------------------
>>>>> 
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> 
>>>>> 
>>>>> End of Analytics Digest, Vol 38, Issue 21
>>>>> *****************************************
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>> 
>>> 
>>> 
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>> 
>>> 
>>> 
>>> ------------------------------
>>> 
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> 
>> 
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> 
> 
> 
> 
> -- 
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> End of Analytics Digest, Vol 38, Issue 24
> *****************************************


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to