thanks, both. Let's go ahead with English only and no spiders filtered or
mobile/desktop breakdown, per Oliver.

Michelle – given the aggregation level I am fine moving forward with this
release, but let me know off-thread if you have any questions.

Dario

On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <oke...@wikimedia.org> wrote:

> Dario,
>
> No spider filtering, and no split between mobile and desktop; mobile
> and desktop are grouped.
>
> On 15 April 2015 at 12:46, Hirav Gandhi <hirav.gan...@gmail.com> wrote:
> > e.g. German*
> >
> > I need more coffee.
> >
> >
> >
> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <hirav.gan...@gmail.com>
> > wrote:
> >>
> >> Dario - we just want a representative samples of traffic for a popular
> >> site like Wikipedia. We thought limiting to the English Wikipedia would
> be
> >> easier.
> >>
> >> If we get aggregated data across all language Wikipedia sites, we would
> >> need someway to tease out which language is being queried when. Some
> >> languages (for e.g. German) we would hypothesize would have more daily
> >> seasonality than languages like English.
> >>
> >>
> >>
> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli
> >> <dtarabore...@wikimedia.org> wrote:
> >>>
> >>> Hirav, Bharath – I also want to hear from you if there's a specific
> >>> reason to ask for English Wikipedia only or if a dataset encompassing
> >>> aggregate pageviews across all Wikimedia properties would do the job.
> >>>
> >>> Dario
> >>>
> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli
> >>> <dtarabore...@wikimedia.org> wrote:
> >>>>
> >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing
> >>>> this data in aggregate under CC0, I believe it would be valuable for
> this
> >>>> and other research projects (copying Michelle from Legal).
> >>>>
> >>>> Before we do so, though, I want to confirm the specs: aggregate
> >>>> pageviews per second to English Wikipedia, excluding bot traffic,
> broken
> >>>> down by access method (mobile web vs desktop site, not apps) for a
> 60-day
> >>>> period. Oliver – are these the filters you used to identify the data
> point
> >>>> with the smallest number of observations?
> >>>>
> >>>> Obviously, we will need to take into account this release when we
> start
> >>>> working on projects such as
> >>>>
> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
> >>>> and
> >>>>
> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
> >>>>
> >>>> Dario
> >>>>
> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <oke...@wikimedia.org>
> >>>> wrote:
> >>>>>
> >>>>> Bumping for Dario, per Pine's excellent example :)
> >>>>>
> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <hirav.gan...@gmail.com>
> wrote:
> >>>>> > Oliver: Two months is fine. Thank you so much for your help!
> >>>>> >
> >>>>> >> On Apr 13, 2015, at 4:40 PM,
> analytics-requ...@lists.wikimedia.org
> >>>>> >> wrote:
> >>>>> >>
> >>>>> >> Send Analytics mailing list submissions to
> >>>>> >>       analytics@lists.wikimedia.org
> >>>>> >>
> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit
> >>>>> >>       https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >> or, via email, send a message with subject or body 'help' to
> >>>>> >>       analytics-requ...@lists.wikimedia.org
> >>>>> >>
> >>>>> >> You can reach the person managing the list at
> >>>>> >>       analytics-ow...@lists.wikimedia.org
> >>>>> >>
> >>>>> >> When replying, please edit your Subject line so it is more
> specific
> >>>>> >> than "Re: Contents of Analytics digest..."
> >>>>> >>
> >>>>> >>
> >>>>> >> Today's Topics:
> >>>>> >>
> >>>>> >>   1. Re: Page views on a more frequent than hourly basis (Pine W)
> >>>>> >>   2. Re: Page views on a more frequent than hourly basis (Hirav
> >>>>> >> Gandhi)
> >>>>> >>   3. Re: Page views on a more frequent than hourly basis (Oliver
> >>>>> >> Keyes)
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> ----------------------------------------------------------------------
> >>>>> >>
> >>>>> >> Message: 1
> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700
> >>>>> >> From: Pine W <wiki.p...@gmail.com>
> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody
> who
> >>>>> >>       has an  interest in Wikipedia and analytics."
> >>>>> >>       <analytics@lists.wikimedia.org>
> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
> >>>>> >>       basis
> >>>>> >> Message-ID:
> >>>>> >>
> >>>>> >> <CAF=
> dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com>
> >>>>> >> Content-Type: text/plain; charset="utf-8"
> >>>>> >>
> >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol
> we
> >>>>> >> followed in IEGCom to ping people who are subscribed and mentioned
> >>>>> >> in
> >>>>> >> certain emails but, like many of us, may automatically move emails
> >>>>> >> from
> >>>>> >> lists directly to folders where they may be unread for days. So
> >>>>> >> there is a
> >>>>> >> reason to do this.
> >>>>> >>
> >>>>> >> Thanks,
> >>>>> >>
> >>>>> >> Pine
> >>>>> >> -------------- next part --------------
> >>>>> >> An HTML attachment was scrubbed...
> >>>>> >> URL:
> >>>>> >> <
> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html
> >
> >>>>> >>
> >>>>> >> ------------------------------
> >>>>> >>
> >>>>> >> Message: 2
> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700
> >>>>> >> From: Hirav Gandhi <hirav.gan...@gmail.com>
> >>>>> >> To: analytics@lists.wikimedia.org
> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
> >>>>> >>       basis
> >>>>> >> Message-ID:
> >>>>> >>
> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=
> uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com>
> >>>>> >> Content-Type: text/plain; charset="utf-8"
> >>>>> >>
> >>>>> >> Thanks Oliver!
> >>>>> >>
> >>>>> >> We would like this data for as broad of a time period as you can
> >>>>> >> muster.
> >>>>> >> The more days, months and year represented in the dataset, the
> >>>>> >> better.
> >>>>> >>
> >>>>> >>
> >>>>> >>> Okay, so:
> >>>>> >>>
> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated
> pageviews
> >>>>> >>> to
> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to one-second
> >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per
> >>>>> >>> second
> >>>>> >>> was 2,981
> >>>>> >>>
> >>>>> >>> So, I don't personally have a problem with generating a release
> of:
> >>>>> >>>
> >>>>> >>> 1. Pageviews per second;
> >>>>> >>> 2. To enwiki;
> >>>>> >>> 3. Over $TIME_PERIOD;
> >>>>> >>> 4. grouping the mobile and desktop site
> >>>>> >>>
> >>>>> >>> But Dario or someone should chip in before I touch anything ;p
> >>>>> >>>
> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At
> >>>>> >>> least
> >>>>> >>> given our biases towards north america and europe
> >>>>> >>>
> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org>
> >>>>> >>> wrote:
> >>>>> >>>> Then that sounds much more viable. I'll run a quick test now to
> >>>>> >>>> see
> >>>>> >>>> how much clustering we'd see at, say, the one-second resolution
> >>>>> >>>> level,
> >>>>> >>>> and throw it out here so we can make more informed decisions
> about
> >>>>> >>>> a
> >>>>> >>>> data release on this.
> >>>>> >>>>
> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gan...@gmail.com
> >
> >>>>> >>>> wrote:
> >>>>> >>>>> Hi Oliver,
> >>>>> >>>>>
> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/
> contextually
> >>>>> >>> granular
> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
> temporally
> >>>>> >>> granular,
> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter you've
> >>>>> >>>>> got
> >>>>> >>> more of
> >>>>> >>>>> a shot, I suspect.
> >>>>> >>>>>
> >>>>> >>>>> I only want the latter - I am not concerned with the context so
> >>>>> >>>>> much as
> >>>>> >>> just
> >>>>> >>>>> “a view to a page on enwiki at X time.”
> >>>>> >>>>>
> >>>>> >>>>> Hirav
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM,
> >>>>> >>>>> analytics-requ...@lists.wikimedia.org
> >>>>> >>> wrote:
> >>>>> >>>>>
> >>>>> >>>>> Send Analytics mailing list submissions to
> >>>>> >>>>> analytics@lists.wikimedia.org
> >>>>> >>>>>
> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit
> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>> or, via email, send a message with subject or body 'help' to
> >>>>> >>>>> analytics-requ...@lists.wikimedia.org
> >>>>> >>>>>
> >>>>> >>>>> You can reach the person managing the list at
> >>>>> >>>>> analytics-ow...@lists.wikimedia.org
> >>>>> >>>>>
> >>>>> >>>>> When replying, please edit your Subject line so it is more
> >>>>> >>>>> specific
> >>>>> >>>>> than "Re: Contents of Analytics digest..."
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> Today's Topics:
> >>>>> >>>>>
> >>>>> >>>>>  1. Re: Page views on a more frequent than hourly basis (Pine
> W)
> >>>>> >>>>>  2. Re: Page views on a more frequent than hourly basis (Oliver
> >>>>> >>>>> Keyes)
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> ----------------------------------------------------------------------
> >>>>> >>>>>
> >>>>> >>>>> Message: 1
> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
> >>>>> >>>>> From: Pine W <wiki.p...@gmail.com>
> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody
> >>>>> >>>>> who
> >>>>> >>>>> has an interest in Wikipedia and analytics."
> >>>>> >>>>> <analytics@lists.wikimedia.org>
> >>>>> >>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
> >>>>> >>>>> hourly
> >>>>> >>>>> basis
> >>>>> >>>>> Message-ID:
> >>>>> >>>>>
> >>>>> >>>>> <CAF=
> dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com>
> >>>>> >>>>> Content-Type: text/plain; charset="utf-8"
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> Hi,
> >>>>> >>>>>
> >>>>> >>>>> This issue of pageview data granularity has been discussed
> >>>>> >>>>> before, and
> >>>>> >>> the
> >>>>> >>>>> answer has been that hourly is the smallest increment allowed
> to
> >>>>> >>>>> be
> >>>>> >>>>> revealed publicly, for privacy reasons.
> >>>>> >>>>>
> >>>>> >>>>> I believe that the person you will want to discuss your request
> >>>>> >>>>> with is
> >>>>> >>>>> Toby, who I have cc'd here.
> >>>>> >>>>>
> >>>>> >>>>> Pine
> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <
> hirav.gan...@gmail.com>
> >>>>> >>> wrote:
> >>>>> >>>>>
> >>>>> >>>>> Hi Wikimedia Analytics Team,
> >>>>> >>>>>
> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server
> >>>>> >>> allocation
> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test
> >>>>> >>>>> our
> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
> amazing
> >>>>> >>>>> data
> >>>>> >>> set
> >>>>> >>>>> of hourly page views, but we were looking for something a bit
> >>>>> >>>>> more
> >>>>> >>>>> granular, such as aggregated page requests to English Wikipedia
> >>>>> >>>>> on a
> >>>>> >>> minute
> >>>>> >>>>> by minute basis or second by second basis if possible.
> >>>>> >>>>>
> >>>>> >>>>> We are more than happy to pour through any raw data you might
> >>>>> >>>>> have that
> >>>>> >>>>> would help us calculate page requests at this granular level.
> >>>>> >>>>> Please
> >>>>> >>> let us
> >>>>> >>>>> know if it would be possible to get such data and if so how.
> >>>>> >>>>> Thank you
> >>>>> >>> in
> >>>>> >>>>> advance for your help.
> >>>>> >>>>>
> >>>>> >>>>> Best,
> >>>>> >>>>>
> >>>>> >>>>> Hirav Gandhi
> >>>>> >>>>> _______________________________________________
> >>>>> >>>>> Analytics mailing list
> >>>>> >>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>
> >>>>> >>>>> -------------- next part --------------
> >>>>> >>>>> An HTML attachment was scrubbed...
> >>>>> >>>>> URL:
> >>>>> >>>>> <
> >>>>> >>>
> >>>>> >>>
> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
> >>>>> >>>>
> >>>>> >>>>>
> >>>>> >>>>> ------------------------------
> >>>>> >>>>>
> >>>>> >>>>> Message: 2
> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
> >>>>> >>>>> From: Oliver Keyes <oke...@wikimedia.org>
> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody
> >>>>> >>>>> who
> >>>>> >>>>> has an interest in Wikipedia and analytics."
> >>>>> >>>>> <analytics@lists.wikimedia.org>
> >>>>> >>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
> >>>>> >>>>> hourly
> >>>>> >>>>> basis
> >>>>> >>>>> Message-ID:
> >>>>> >>>>>
> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=
> h...@mail.gmail.com>
> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's
> the
> >>>>> >>>>> director of analytics.
> >>>>> >>>>>
> >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually
> >>>>> >>>>> granular
> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
> temporally
> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the
> >>>>> >>>>> latter
> >>>>> >>>>> you've got more of a shot, I suspect.
> >>>>> >>>>>
> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> wrote:
> >>>>> >>>>>
> >>>>> >>>>> Hi,
> >>>>> >>>>>
> >>>>> >>>>> This issue of pageview data granularity has been discussed
> >>>>> >>>>> before, and
> >>>>> >>> the
> >>>>> >>>>> answer has been that hourly is the smallest increment allowed
> to
> >>>>> >>>>> be
> >>>>> >>> revealed
> >>>>> >>>>> publicly, for privacy reasons.
> >>>>> >>>>>
> >>>>> >>>>> I believe that the person you will want to discuss your request
> >>>>> >>>>> with is
> >>>>> >>>>> Toby, who I have cc'd here.
> >>>>> >>>>>
> >>>>> >>>>> Pine
> >>>>> >>>>>
> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <
> hirav.gan...@gmail.com>
> >>>>> >>> wrote:
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> Hi Wikimedia Analytics Team,
> >>>>> >>>>>
> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server
> >>>>> >>> allocation
> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test
> >>>>> >>>>> our
> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
> amazing
> >>>>> >>>>> data
> >>>>> >>> set
> >>>>> >>>>> of hourly page views, but we were looking for something a bit
> >>>>> >>>>> more
> >>>>> >>> granular,
> >>>>> >>>>> such as aggregated page requests to English Wikipedia on a
> minute
> >>>>> >>>>> by
> >>>>> >>> minute
> >>>>> >>>>> basis or second by second basis if possible.
> >>>>> >>>>>
> >>>>> >>>>> We are more than happy to pour through any raw data you might
> >>>>> >>>>> have that
> >>>>> >>>>> would help us calculate page requests at this granular level.
> >>>>> >>>>> Please
> >>>>> >>> let us
> >>>>> >>>>> know if it would be possible to get such data and if so how.
> >>>>> >>>>> Thank you
> >>>>> >>> in
> >>>>> >>>>> advance for your help.
> >>>>> >>>>>
> >>>>> >>>>> Best,
> >>>>> >>>>>
> >>>>> >>>>> Hirav Gandhi
> >>>>> >>>>> _______________________________________________
> >>>>> >>>>> Analytics mailing list
> >>>>> >>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> _______________________________________________
> >>>>> >>>>> Analytics mailing list
> >>>>> >>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> --
> >>>>> >>>>> Oliver Keyes
> >>>>> >>>>> Research Analyst
> >>>>> >>>>> Wikimedia Foundation
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> ------------------------------
> >>>>> >>>>>
> >>>>> >>>>> _______________________________________________
> >>>>> >>>>> Analytics mailing list
> >>>>> >>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21
> >>>>> >>>>> *****************************************
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> _______________________________________________
> >>>>> >>>>> Analytics mailing list
> >>>>> >>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>> --
> >>>>> >>>> Oliver Keyes
> >>>>> >>>> Research Analyst
> >>>>> >>>> Wikimedia Foundation
> >>>>> >>>
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> --
> >>>>> >>> Oliver Keyes
> >>>>> >>> Research Analyst
> >>>>> >>> Wikimedia Foundation
> >>>>> >>>
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> ------------------------------
> >>>>> >>>
> >>>>> >>> _______________________________________________
> >>>>> >>> Analytics mailing list
> >>>>> >>> Analytics@lists.wikimedia.org
> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>
> >>>>> >> -------------- next part --------------
> >>>>> >> An HTML attachment was scrubbed...
> >>>>> >> URL:
> >>>>> >> <
> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html
> >
> >>>>> >>
> >>>>> >> ------------------------------
> >>>>> >>
> >>>>> >> Message: 3
> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400
> >>>>> >> From: Oliver Keyes <oke...@wikimedia.org>
> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody
> who
> >>>>> >>       has an  interest in Wikipedia and analytics."
> >>>>> >>       <analytics@lists.wikimedia.org>
> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
> >>>>> >>       basis
> >>>>> >> Message-ID:
> >>>>> >>
> >>>>> >> <
> caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com>
> >>>>> >> Content-Type: text/plain; charset=UTF-8
> >>>>> >>
> >>>>> >> ....
> >>>>> >>
> >>>>> >>
> >>>>> >> ...years?
> >>>>> >>
> >>>>> >> We have unsampled logs for, ah. 2 months.
> >>>>> >>
> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <hirav.gan...@gmail.com>
> >>>>> >> wrote:
> >>>>> >>> Thanks Oliver!
> >>>>> >>>
> >>>>> >>> We would like this data for as broad of a time period as you can
> >>>>> >>> muster. The
> >>>>> >>> more days, months and year represented in the dataset, the
> better.
> >>>>> >>>
> >>>>> >>>>
> >>>>> >>>> Okay, so:
> >>>>> >>>>
> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated
> >>>>> >>>> pageviews to
> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to
> one-second
> >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per
> >>>>> >>>> second
> >>>>> >>>> was 2,981
> >>>>> >>>>
> >>>>> >>>> So, I don't personally have a problem with generating a release
> >>>>> >>>> of:
> >>>>> >>>>
> >>>>> >>>> 1. Pageviews per second;
> >>>>> >>>> 2. To enwiki;
> >>>>> >>>> 3. Over $TIME_PERIOD;
> >>>>> >>>> 4. grouping the mobile and desktop site
> >>>>> >>>>
> >>>>> >>>> But Dario or someone should chip in before I touch anything ;p
> >>>>> >>>>
> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At
> >>>>> >>>> least
> >>>>> >>>> given our biases towards north america and europe
> >>>>> >>>>
> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org>
> >>>>> >>>> wrote:
> >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now to
> >>>>> >>>>> see
> >>>>> >>>>> how much clustering we'd see at, say, the one-second resolution
> >>>>> >>>>> level,
> >>>>> >>>>> and throw it out here so we can make more informed decisions
> >>>>> >>>>> about a
> >>>>> >>>>> data release on this.
> >>>>> >>>>>
> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi <
> hirav.gan...@gmail.com>
> >>>>> >>>>> wrote:
> >>>>> >>>>>> Hi Oliver,
> >>>>> >>>>>>
> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/
> >>>>> >>>>>> contextually
> >>>>> >>>>>> granular
> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
> temporally
> >>>>> >>>>>> granular,
> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter
> you've
> >>>>> >>>>>> got
> >>>>> >>>>>> more of
> >>>>> >>>>>> a shot, I suspect.
> >>>>> >>>>>>
> >>>>> >>>>>> I only want the latter - I am not concerned with the context
> so
> >>>>> >>>>>> much as
> >>>>> >>>>>> just
> >>>>> >>>>>> “a view to a page on enwiki at X time.”
> >>>>> >>>>>>
> >>>>> >>>>>> Hirav
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM,
> >>>>> >>>>>> analytics-requ...@lists.wikimedia.org
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Send Analytics mailing list submissions to
> >>>>> >>>>>> analytics@lists.wikimedia.org
> >>>>> >>>>>>
> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit
> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>> or, via email, send a message with subject or body 'help' to
> >>>>> >>>>>> analytics-requ...@lists.wikimedia.org
> >>>>> >>>>>>
> >>>>> >>>>>> You can reach the person managing the list at
> >>>>> >>>>>> analytics-ow...@lists.wikimedia.org
> >>>>> >>>>>>
> >>>>> >>>>>> When replying, please edit your Subject line so it is more
> >>>>> >>>>>> specific
> >>>>> >>>>>> than "Re: Contents of Analytics digest..."
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Today's Topics:
> >>>>> >>>>>>
> >>>>> >>>>>>  1. Re: Page views on a more frequent than hourly basis (Pine
> W)
> >>>>> >>>>>>  2. Re: Page views on a more frequent than hourly basis
> (Oliver
> >>>>> >>>>>> Keyes)
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> ----------------------------------------------------------------------
> >>>>> >>>>>>
> >>>>> >>>>>> Message: 1
> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
> >>>>> >>>>>> From: Pine W <wiki.p...@gmail.com>
> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
> everybody
> >>>>> >>>>>> who
> >>>>> >>>>>> has an interest in Wikipedia and analytics."
> >>>>> >>>>>> <analytics@lists.wikimedia.org>
> >>>>> >>>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
> >>>>> >>>>>> hourly
> >>>>> >>>>>> basis
> >>>>> >>>>>> Message-ID:
> >>>>> >>>>>>
> >>>>> >>>>>> <CAF=
> dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com>
> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8"
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Hi,
> >>>>> >>>>>>
> >>>>> >>>>>> This issue of pageview data granularity has been discussed
> >>>>> >>>>>> before, and
> >>>>> >>>>>> the
> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed
> to
> >>>>> >>>>>> be
> >>>>> >>>>>> revealed publicly, for privacy reasons.
> >>>>> >>>>>>
> >>>>> >>>>>> I believe that the person you will want to discuss your
> request
> >>>>> >>>>>> with is
> >>>>> >>>>>> Toby, who I have cc'd here.
> >>>>> >>>>>>
> >>>>> >>>>>> Pine
> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
> >>>>> >>>>>> <hirav.gan...@gmail.com>
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Hi Wikimedia Analytics Team,
> >>>>> >>>>>>
> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
> server
> >>>>> >>>>>> allocation
> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test
> >>>>> >>>>>> our
> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
> >>>>> >>>>>> amazing data
> >>>>> >>>>>> set
> >>>>> >>>>>> of hourly page views, but we were looking for something a bit
> >>>>> >>>>>> more
> >>>>> >>>>>> granular, such as aggregated page requests to English
> Wikipedia
> >>>>> >>>>>> on a
> >>>>> >>>>>> minute
> >>>>> >>>>>> by minute basis or second by second basis if possible.
> >>>>> >>>>>>
> >>>>> >>>>>> We are more than happy to pour through any raw data you might
> >>>>> >>>>>> have that
> >>>>> >>>>>> would help us calculate page requests at this granular level.
> >>>>> >>>>>> Please
> >>>>> >>>>>> let us
> >>>>> >>>>>> know if it would be possible to get such data and if so how.
> >>>>> >>>>>> Thank you
> >>>>> >>>>>> in
> >>>>> >>>>>> advance for your help.
> >>>>> >>>>>>
> >>>>> >>>>>> Best,
> >>>>> >>>>>>
> >>>>> >>>>>> Hirav Gandhi
> >>>>> >>>>>> _______________________________________________
> >>>>> >>>>>> Analytics mailing list
> >>>>> >>>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>>
> >>>>> >>>>>> -------------- next part --------------
> >>>>> >>>>>> An HTML attachment was scrubbed...
> >>>>> >>>>>> URL:
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> <
> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
> >
> >>>>> >>>>>>
> >>>>> >>>>>> ------------------------------
> >>>>> >>>>>>
> >>>>> >>>>>> Message: 2
> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
> >>>>> >>>>>> From: Oliver Keyes <oke...@wikimedia.org>
> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
> everybody
> >>>>> >>>>>> who
> >>>>> >>>>>> has an interest in Wikipedia and analytics."
> >>>>> >>>>>> <analytics@lists.wikimedia.org>
> >>>>> >>>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com>
> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
> >>>>> >>>>>> hourly
> >>>>> >>>>>> basis
> >>>>> >>>>>> Message-ID:
> >>>>> >>>>>>
> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=
> h...@mail.gmail.com>
> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's
> the
> >>>>> >>>>>> director of analytics.
> >>>>> >>>>>>
> >>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually
> >>>>> >>>>>> granular
> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
> temporally
> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the
> >>>>> >>>>>> latter
> >>>>> >>>>>> you've got more of a shot, I suspect.
> >>>>> >>>>>>
> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com>
> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>> Hi,
> >>>>> >>>>>>
> >>>>> >>>>>> This issue of pageview data granularity has been discussed
> >>>>> >>>>>> before, and
> >>>>> >>>>>> the
> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed
> to
> >>>>> >>>>>> be
> >>>>> >>>>>> revealed
> >>>>> >>>>>> publicly, for privacy reasons.
> >>>>> >>>>>>
> >>>>> >>>>>> I believe that the person you will want to discuss your
> request
> >>>>> >>>>>> with is
> >>>>> >>>>>> Toby, who I have cc'd here.
> >>>>> >>>>>>
> >>>>> >>>>>> Pine
> >>>>> >>>>>>
> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
> >>>>> >>>>>> <hirav.gan...@gmail.com>
> >>>>> >>>>>> wrote:
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> Hi Wikimedia Analytics Team,
> >>>>> >>>>>>
> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
> server
> >>>>> >>>>>> allocation
> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test
> >>>>> >>>>>> our
> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
> >>>>> >>>>>> amazing data
> >>>>> >>>>>> set
> >>>>> >>>>>> of hourly page views, but we were looking for something a bit
> >>>>> >>>>>> more
> >>>>> >>>>>> granular,
> >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a
> >>>>> >>>>>> minute by
> >>>>> >>>>>> minute
> >>>>> >>>>>> basis or second by second basis if possible.
> >>>>> >>>>>>
> >>>>> >>>>>> We are more than happy to pour through any raw data you might
> >>>>> >>>>>> have that
> >>>>> >>>>>> would help us calculate page requests at this granular level.
> >>>>> >>>>>> Please
> >>>>> >>>>>> let us
> >>>>> >>>>>> know if it would be possible to get such data and if so how.
> >>>>> >>>>>> Thank you
> >>>>> >>>>>> in
> >>>>> >>>>>> advance for your help.
> >>>>> >>>>>>
> >>>>> >>>>>> Best,
> >>>>> >>>>>>
> >>>>> >>>>>> Hirav Gandhi
> >>>>> >>>>>> _______________________________________________
> >>>>> >>>>>> Analytics mailing list
> >>>>> >>>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> _______________________________________________
> >>>>> >>>>>> Analytics mailing list
> >>>>> >>>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> --
> >>>>> >>>>>> Oliver Keyes
> >>>>> >>>>>> Research Analyst
> >>>>> >>>>>> Wikimedia Foundation
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> ------------------------------
> >>>>> >>>>>>
> >>>>> >>>>>> _______________________________________________
> >>>>> >>>>>> Analytics mailing list
> >>>>> >>>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21
> >>>>> >>>>>> *****************************************
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>>
> >>>>> >>>>>> _______________________________________________
> >>>>> >>>>>> Analytics mailing list
> >>>>> >>>>>> Analytics@lists.wikimedia.org
> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>>
> >>>>> >>>>> --
> >>>>> >>>>> Oliver Keyes
> >>>>> >>>>> Research Analyst
> >>>>> >>>>> Wikimedia Foundation
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>> --
> >>>>> >>>> Oliver Keyes
> >>>>> >>>> Research Analyst
> >>>>> >>>> Wikimedia Foundation
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>>
> >>>>> >>>> ------------------------------
> >>>>> >>>>
> >>>>> >>>> _______________________________________________
> >>>>> >>>> Analytics mailing list
> >>>>> >>>> Analytics@lists.wikimedia.org
> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>
> >>>>> >>>
> >>>>> >>> _______________________________________________
> >>>>> >>> Analytics mailing list
> >>>>> >>> Analytics@lists.wikimedia.org
> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>>
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> --
> >>>>> >> Oliver Keyes
> >>>>> >> Research Analyst
> >>>>> >> Wikimedia Foundation
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> ------------------------------
> >>>>> >>
> >>>>> >> _______________________________________________
> >>>>> >> Analytics mailing list
> >>>>> >> Analytics@lists.wikimedia.org
> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>> >>
> >>>>> >>
> >>>>> >> End of Analytics Digest, Vol 38, Issue 24
> >>>>> >> *****************************************
> >>>>> >
> >>>>> >
> >>>>> > _______________________________________________
> >>>>> > Analytics mailing list
> >>>>> > Analytics@lists.wikimedia.org
> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Oliver Keyes
> >>>>> Research Analyst
> >>>>> Wikimedia Foundation
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Dario Taraborelli
> >>>> Senior Research Scientist, Research and Data Lead
> >>>> Wikimedia Foundation
> >>>> http://wikimediafoundation.org
> >>>> http://nitens.org/taraborelli
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Dario Taraborelli
> >>> Senior Research Scientist, Research and Data Lead
> >>> Wikimedia Foundation
> >>> http://wikimediafoundation.org
> >>> http://nitens.org/taraborelli
> >>
> >>
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Dario Taraborelli
Senior Research Scientist, Research and Data Lead
Wikimedia Foundation
http://wikimediafoundation.org
http://nitens.org/taraborelli
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to