thanks, both. Let's go ahead with English only and no spiders filtered or mobile/desktop breakdown, per Oliver.
Michelle – given the aggregation level I am fine moving forward with this release, but let me know off-thread if you have any questions. Dario On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <oke...@wikimedia.org> wrote: > Dario, > > No spider filtering, and no split between mobile and desktop; mobile > and desktop are grouped. > > On 15 April 2015 at 12:46, Hirav Gandhi <hirav.gan...@gmail.com> wrote: > > e.g. German* > > > > I need more coffee. > > > > > > > > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <hirav.gan...@gmail.com> > > wrote: > >> > >> Dario - we just want a representative samples of traffic for a popular > >> site like Wikipedia. We thought limiting to the English Wikipedia would > be > >> easier. > >> > >> If we get aggregated data across all language Wikipedia sites, we would > >> need someway to tease out which language is being queried when. Some > >> languages (for e.g. German) we would hypothesize would have more daily > >> seasonality than languages like English. > >> > >> > >> > >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli > >> <dtarabore...@wikimedia.org> wrote: > >>> > >>> Hirav, Bharath – I also want to hear from you if there's a specific > >>> reason to ask for English Wikipedia only or if a dataset encompassing > >>> aggregate pageviews across all Wikimedia properties would do the job. > >>> > >>> Dario > >>> > >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli > >>> <dtarabore...@wikimedia.org> wrote: > >>>> > >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing > >>>> this data in aggregate under CC0, I believe it would be valuable for > this > >>>> and other research projects (copying Michelle from Legal). > >>>> > >>>> Before we do so, though, I want to confirm the specs: aggregate > >>>> pageviews per second to English Wikipedia, excluding bot traffic, > broken > >>>> down by access method (mobile web vs desktop site, not apps) for a > 60-day > >>>> period. Oliver – are these the filters you used to identify the data > point > >>>> with the smallest number of observations? > >>>> > >>>> Obviously, we will need to take into account this release when we > start > >>>> working on projects such as > >>>> > https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits > >>>> and > >>>> > https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews > >>>> > >>>> Dario > >>>> > >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <oke...@wikimedia.org> > >>>> wrote: > >>>>> > >>>>> Bumping for Dario, per Pine's excellent example :) > >>>>> > >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <hirav.gan...@gmail.com> > wrote: > >>>>> > Oliver: Two months is fine. Thank you so much for your help! > >>>>> > > >>>>> >> On Apr 13, 2015, at 4:40 PM, > analytics-requ...@lists.wikimedia.org > >>>>> >> wrote: > >>>>> >> > >>>>> >> Send Analytics mailing list submissions to > >>>>> >> analytics@lists.wikimedia.org > >>>>> >> > >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit > >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >> or, via email, send a message with subject or body 'help' to > >>>>> >> analytics-requ...@lists.wikimedia.org > >>>>> >> > >>>>> >> You can reach the person managing the list at > >>>>> >> analytics-ow...@lists.wikimedia.org > >>>>> >> > >>>>> >> When replying, please edit your Subject line so it is more > specific > >>>>> >> than "Re: Contents of Analytics digest..." > >>>>> >> > >>>>> >> > >>>>> >> Today's Topics: > >>>>> >> > >>>>> >> 1. Re: Page views on a more frequent than hourly basis (Pine W) > >>>>> >> 2. Re: Page views on a more frequent than hourly basis (Hirav > >>>>> >> Gandhi) > >>>>> >> 3. Re: Page views on a more frequent than hourly basis (Oliver > >>>>> >> Keyes) > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > ---------------------------------------------------------------------- > >>>>> >> > >>>>> >> Message: 1 > >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700 > >>>>> >> From: Pine W <wiki.p...@gmail.com> > >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody > who > >>>>> >> has an interest in Wikipedia and analytics." > >>>>> >> <analytics@lists.wikimedia.org> > >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly > >>>>> >> basis > >>>>> >> Message-ID: > >>>>> >> > >>>>> >> <CAF= > dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com> > >>>>> >> Content-Type: text/plain; charset="utf-8" > >>>>> >> > >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol > we > >>>>> >> followed in IEGCom to ping people who are subscribed and mentioned > >>>>> >> in > >>>>> >> certain emails but, like many of us, may automatically move emails > >>>>> >> from > >>>>> >> lists directly to folders where they may be unread for days. So > >>>>> >> there is a > >>>>> >> reason to do this. > >>>>> >> > >>>>> >> Thanks, > >>>>> >> > >>>>> >> Pine > >>>>> >> -------------- next part -------------- > >>>>> >> An HTML attachment was scrubbed... > >>>>> >> URL: > >>>>> >> < > https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html > > > >>>>> >> > >>>>> >> ------------------------------ > >>>>> >> > >>>>> >> Message: 2 > >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700 > >>>>> >> From: Hirav Gandhi <hirav.gan...@gmail.com> > >>>>> >> To: analytics@lists.wikimedia.org > >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly > >>>>> >> basis > >>>>> >> Message-ID: > >>>>> >> > >>>>> >> <CANzC_EOvi4MP7G_SsxvW= > uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com> > >>>>> >> Content-Type: text/plain; charset="utf-8" > >>>>> >> > >>>>> >> Thanks Oliver! > >>>>> >> > >>>>> >> We would like this data for as broad of a time period as you can > >>>>> >> muster. > >>>>> >> The more days, months and year represented in the dataset, the > >>>>> >> better. > >>>>> >> > >>>>> >> > >>>>> >>> Okay, so: > >>>>> >>> > >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated > pageviews > >>>>> >>> to > >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to one-second > >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per > >>>>> >>> second > >>>>> >>> was 2,981 > >>>>> >>> > >>>>> >>> So, I don't personally have a problem with generating a release > of: > >>>>> >>> > >>>>> >>> 1. Pageviews per second; > >>>>> >>> 2. To enwiki; > >>>>> >>> 3. Over $TIME_PERIOD; > >>>>> >>> 4. grouping the mobile and desktop site > >>>>> >>> > >>>>> >>> But Dario or someone should chip in before I touch anything ;p > >>>>> >>> > >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At > >>>>> >>> least > >>>>> >>> given our biases towards north america and europe > >>>>> >>> > >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org> > >>>>> >>> wrote: > >>>>> >>>> Then that sounds much more viable. I'll run a quick test now to > >>>>> >>>> see > >>>>> >>>> how much clustering we'd see at, say, the one-second resolution > >>>>> >>>> level, > >>>>> >>>> and throw it out here so we can make more informed decisions > about > >>>>> >>>> a > >>>>> >>>> data release on this. > >>>>> >>>> > >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi <hirav.gan...@gmail.com > > > >>>>> >>>> wrote: > >>>>> >>>>> Hi Oliver, > >>>>> >>>>> > >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/ > contextually > >>>>> >>> granular > >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just > temporally > >>>>> >>> granular, > >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter you've > >>>>> >>>>> got > >>>>> >>> more of > >>>>> >>>>> a shot, I suspect. > >>>>> >>>>> > >>>>> >>>>> I only want the latter - I am not concerned with the context so > >>>>> >>>>> much as > >>>>> >>> just > >>>>> >>>>> “a view to a page on enwiki at X time.” > >>>>> >>>>> > >>>>> >>>>> Hirav > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> On Apr 13, 2015, at 5:00 AM, > >>>>> >>>>> analytics-requ...@lists.wikimedia.org > >>>>> >>> wrote: > >>>>> >>>>> > >>>>> >>>>> Send Analytics mailing list submissions to > >>>>> >>>>> analytics@lists.wikimedia.org > >>>>> >>>>> > >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit > >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>> or, via email, send a message with subject or body 'help' to > >>>>> >>>>> analytics-requ...@lists.wikimedia.org > >>>>> >>>>> > >>>>> >>>>> You can reach the person managing the list at > >>>>> >>>>> analytics-ow...@lists.wikimedia.org > >>>>> >>>>> > >>>>> >>>>> When replying, please edit your Subject line so it is more > >>>>> >>>>> specific > >>>>> >>>>> than "Re: Contents of Analytics digest..." > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> Today's Topics: > >>>>> >>>>> > >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis (Pine > W) > >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis (Oliver > >>>>> >>>>> Keyes) > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > ---------------------------------------------------------------------- > >>>>> >>>>> > >>>>> >>>>> Message: 1 > >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 > >>>>> >>>>> From: Pine W <wiki.p...@gmail.com> > >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody > >>>>> >>>>> who > >>>>> >>>>> has an interest in Wikipedia and analytics." > >>>>> >>>>> <analytics@lists.wikimedia.org> > >>>>> >>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> > >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than > >>>>> >>>>> hourly > >>>>> >>>>> basis > >>>>> >>>>> Message-ID: > >>>>> >>>>> > >>>>> >>>>> <CAF= > dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> > >>>>> >>>>> Content-Type: text/plain; charset="utf-8" > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> Hi, > >>>>> >>>>> > >>>>> >>>>> This issue of pageview data granularity has been discussed > >>>>> >>>>> before, and > >>>>> >>> the > >>>>> >>>>> answer has been that hourly is the smallest increment allowed > to > >>>>> >>>>> be > >>>>> >>>>> revealed publicly, for privacy reasons. > >>>>> >>>>> > >>>>> >>>>> I believe that the person you will want to discuss your request > >>>>> >>>>> with is > >>>>> >>>>> Toby, who I have cc'd here. > >>>>> >>>>> > >>>>> >>>>> Pine > >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" < > hirav.gan...@gmail.com> > >>>>> >>> wrote: > >>>>> >>>>> > >>>>> >>>>> Hi Wikimedia Analytics Team, > >>>>> >>>>> > >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server > >>>>> >>> allocation > >>>>> >>>>> algorithms and we were looking for a suitable datasets to test > >>>>> >>>>> our > >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an > amazing > >>>>> >>>>> data > >>>>> >>> set > >>>>> >>>>> of hourly page views, but we were looking for something a bit > >>>>> >>>>> more > >>>>> >>>>> granular, such as aggregated page requests to English Wikipedia > >>>>> >>>>> on a > >>>>> >>> minute > >>>>> >>>>> by minute basis or second by second basis if possible. > >>>>> >>>>> > >>>>> >>>>> We are more than happy to pour through any raw data you might > >>>>> >>>>> have that > >>>>> >>>>> would help us calculate page requests at this granular level. > >>>>> >>>>> Please > >>>>> >>> let us > >>>>> >>>>> know if it would be possible to get such data and if so how. > >>>>> >>>>> Thank you > >>>>> >>> in > >>>>> >>>>> advance for your help. > >>>>> >>>>> > >>>>> >>>>> Best, > >>>>> >>>>> > >>>>> >>>>> Hirav Gandhi > >>>>> >>>>> _______________________________________________ > >>>>> >>>>> Analytics mailing list > >>>>> >>>>> Analytics@lists.wikimedia.org > >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>> > >>>>> >>>>> -------------- next part -------------- > >>>>> >>>>> An HTML attachment was scrubbed... > >>>>> >>>>> URL: > >>>>> >>>>> < > >>>>> >>> > >>>>> >>> > https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html > >>>>> >>>> > >>>>> >>>>> > >>>>> >>>>> ------------------------------ > >>>>> >>>>> > >>>>> >>>>> Message: 2 > >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 > >>>>> >>>>> From: Oliver Keyes <oke...@wikimedia.org> > >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody > >>>>> >>>>> who > >>>>> >>>>> has an interest in Wikipedia and analytics." > >>>>> >>>>> <analytics@lists.wikimedia.org> > >>>>> >>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> > >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than > >>>>> >>>>> hourly > >>>>> >>>>> basis > >>>>> >>>>> Message-ID: > >>>>> >>>>> > >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-= > h...@mail.gmail.com> > >>>>> >>>>> Content-Type: text/plain; charset=UTF-8 > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's > the > >>>>> >>>>> director of analytics. > >>>>> >>>>> > >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually > >>>>> >>>>> granular > >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just > temporally > >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the > >>>>> >>>>> latter > >>>>> >>>>> you've got more of a shot, I suspect. > >>>>> >>>>> > >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> wrote: > >>>>> >>>>> > >>>>> >>>>> Hi, > >>>>> >>>>> > >>>>> >>>>> This issue of pageview data granularity has been discussed > >>>>> >>>>> before, and > >>>>> >>> the > >>>>> >>>>> answer has been that hourly is the smallest increment allowed > to > >>>>> >>>>> be > >>>>> >>> revealed > >>>>> >>>>> publicly, for privacy reasons. > >>>>> >>>>> > >>>>> >>>>> I believe that the person you will want to discuss your request > >>>>> >>>>> with is > >>>>> >>>>> Toby, who I have cc'd here. > >>>>> >>>>> > >>>>> >>>>> Pine > >>>>> >>>>> > >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" < > hirav.gan...@gmail.com> > >>>>> >>> wrote: > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> Hi Wikimedia Analytics Team, > >>>>> >>>>> > >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server > >>>>> >>> allocation > >>>>> >>>>> algorithms and we were looking for a suitable datasets to test > >>>>> >>>>> our > >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an > amazing > >>>>> >>>>> data > >>>>> >>> set > >>>>> >>>>> of hourly page views, but we were looking for something a bit > >>>>> >>>>> more > >>>>> >>> granular, > >>>>> >>>>> such as aggregated page requests to English Wikipedia on a > minute > >>>>> >>>>> by > >>>>> >>> minute > >>>>> >>>>> basis or second by second basis if possible. > >>>>> >>>>> > >>>>> >>>>> We are more than happy to pour through any raw data you might > >>>>> >>>>> have that > >>>>> >>>>> would help us calculate page requests at this granular level. > >>>>> >>>>> Please > >>>>> >>> let us > >>>>> >>>>> know if it would be possible to get such data and if so how. > >>>>> >>>>> Thank you > >>>>> >>> in > >>>>> >>>>> advance for your help. > >>>>> >>>>> > >>>>> >>>>> Best, > >>>>> >>>>> > >>>>> >>>>> Hirav Gandhi > >>>>> >>>>> _______________________________________________ > >>>>> >>>>> Analytics mailing list > >>>>> >>>>> Analytics@lists.wikimedia.org > >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> _______________________________________________ > >>>>> >>>>> Analytics mailing list > >>>>> >>>>> Analytics@lists.wikimedia.org > >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> -- > >>>>> >>>>> Oliver Keyes > >>>>> >>>>> Research Analyst > >>>>> >>>>> Wikimedia Foundation > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> ------------------------------ > >>>>> >>>>> > >>>>> >>>>> _______________________________________________ > >>>>> >>>>> Analytics mailing list > >>>>> >>>>> Analytics@lists.wikimedia.org > >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21 > >>>>> >>>>> ***************************************** > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> _______________________________________________ > >>>>> >>>>> Analytics mailing list > >>>>> >>>>> Analytics@lists.wikimedia.org > >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> -- > >>>>> >>>> Oliver Keyes > >>>>> >>>> Research Analyst > >>>>> >>>> Wikimedia Foundation > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> -- > >>>>> >>> Oliver Keyes > >>>>> >>> Research Analyst > >>>>> >>> Wikimedia Foundation > >>>>> >>> > >>>>> >>> > >>>>> >>> > >>>>> >>> ------------------------------ > >>>>> >>> > >>>>> >>> _______________________________________________ > >>>>> >>> Analytics mailing list > >>>>> >>> Analytics@lists.wikimedia.org > >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>> > >>>>> >> -------------- next part -------------- > >>>>> >> An HTML attachment was scrubbed... > >>>>> >> URL: > >>>>> >> < > https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html > > > >>>>> >> > >>>>> >> ------------------------------ > >>>>> >> > >>>>> >> Message: 3 > >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400 > >>>>> >> From: Oliver Keyes <oke...@wikimedia.org> > >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody > who > >>>>> >> has an interest in Wikipedia and analytics." > >>>>> >> <analytics@lists.wikimedia.org> > >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly > >>>>> >> basis > >>>>> >> Message-ID: > >>>>> >> > >>>>> >> < > caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com> > >>>>> >> Content-Type: text/plain; charset=UTF-8 > >>>>> >> > >>>>> >> .... > >>>>> >> > >>>>> >> > >>>>> >> ...years? > >>>>> >> > >>>>> >> We have unsampled logs for, ah. 2 months. > >>>>> >> > >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <hirav.gan...@gmail.com> > >>>>> >> wrote: > >>>>> >>> Thanks Oliver! > >>>>> >>> > >>>>> >>> We would like this data for as broad of a time period as you can > >>>>> >>> muster. The > >>>>> >>> more days, months and year represented in the dataset, the > better. > >>>>> >>> > >>>>> >>>> > >>>>> >>>> Okay, so: > >>>>> >>>> > >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated > >>>>> >>>> pageviews to > >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to > one-second > >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per > >>>>> >>>> second > >>>>> >>>> was 2,981 > >>>>> >>>> > >>>>> >>>> So, I don't personally have a problem with generating a release > >>>>> >>>> of: > >>>>> >>>> > >>>>> >>>> 1. Pageviews per second; > >>>>> >>>> 2. To enwiki; > >>>>> >>>> 3. Over $TIME_PERIOD; > >>>>> >>>> 4. grouping the mobile and desktop site > >>>>> >>>> > >>>>> >>>> But Dario or someone should chip in before I touch anything ;p > >>>>> >>>> > >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At > >>>>> >>>> least > >>>>> >>>> given our biases towards north america and europe > >>>>> >>>> > >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <oke...@wikimedia.org> > >>>>> >>>> wrote: > >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now to > >>>>> >>>>> see > >>>>> >>>>> how much clustering we'd see at, say, the one-second resolution > >>>>> >>>>> level, > >>>>> >>>>> and throw it out here so we can make more informed decisions > >>>>> >>>>> about a > >>>>> >>>>> data release on this. > >>>>> >>>>> > >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi < > hirav.gan...@gmail.com> > >>>>> >>>>> wrote: > >>>>> >>>>>> Hi Oliver, > >>>>> >>>>>> > >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/ > >>>>> >>>>>> contextually > >>>>> >>>>>> granular > >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just > temporally > >>>>> >>>>>> granular, > >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter > you've > >>>>> >>>>>> got > >>>>> >>>>>> more of > >>>>> >>>>>> a shot, I suspect. > >>>>> >>>>>> > >>>>> >>>>>> I only want the latter - I am not concerned with the context > so > >>>>> >>>>>> much as > >>>>> >>>>>> just > >>>>> >>>>>> “a view to a page on enwiki at X time.” > >>>>> >>>>>> > >>>>> >>>>>> Hirav > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM, > >>>>> >>>>>> analytics-requ...@lists.wikimedia.org > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Send Analytics mailing list submissions to > >>>>> >>>>>> analytics@lists.wikimedia.org > >>>>> >>>>>> > >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit > >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>>> or, via email, send a message with subject or body 'help' to > >>>>> >>>>>> analytics-requ...@lists.wikimedia.org > >>>>> >>>>>> > >>>>> >>>>>> You can reach the person managing the list at > >>>>> >>>>>> analytics-ow...@lists.wikimedia.org > >>>>> >>>>>> > >>>>> >>>>>> When replying, please edit your Subject line so it is more > >>>>> >>>>>> specific > >>>>> >>>>>> than "Re: Contents of Analytics digest..." > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Today's Topics: > >>>>> >>>>>> > >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis (Pine > W) > >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis > (Oliver > >>>>> >>>>>> Keyes) > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > ---------------------------------------------------------------------- > >>>>> >>>>>> > >>>>> >>>>>> Message: 1 > >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 > >>>>> >>>>>> From: Pine W <wiki.p...@gmail.com> > >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and > everybody > >>>>> >>>>>> who > >>>>> >>>>>> has an interest in Wikipedia and analytics." > >>>>> >>>>>> <analytics@lists.wikimedia.org> > >>>>> >>>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> > >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than > >>>>> >>>>>> hourly > >>>>> >>>>>> basis > >>>>> >>>>>> Message-ID: > >>>>> >>>>>> > >>>>> >>>>>> <CAF= > dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> > >>>>> >>>>>> Content-Type: text/plain; charset="utf-8" > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Hi, > >>>>> >>>>>> > >>>>> >>>>>> This issue of pageview data granularity has been discussed > >>>>> >>>>>> before, and > >>>>> >>>>>> the > >>>>> >>>>>> answer has been that hourly is the smallest increment allowed > to > >>>>> >>>>>> be > >>>>> >>>>>> revealed publicly, for privacy reasons. > >>>>> >>>>>> > >>>>> >>>>>> I believe that the person you will want to discuss your > request > >>>>> >>>>>> with is > >>>>> >>>>>> Toby, who I have cc'd here. > >>>>> >>>>>> > >>>>> >>>>>> Pine > >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" > >>>>> >>>>>> <hirav.gan...@gmail.com> > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> Hi Wikimedia Analytics Team, > >>>>> >>>>>> > >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic > server > >>>>> >>>>>> allocation > >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test > >>>>> >>>>>> our > >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an > >>>>> >>>>>> amazing data > >>>>> >>>>>> set > >>>>> >>>>>> of hourly page views, but we were looking for something a bit > >>>>> >>>>>> more > >>>>> >>>>>> granular, such as aggregated page requests to English > Wikipedia > >>>>> >>>>>> on a > >>>>> >>>>>> minute > >>>>> >>>>>> by minute basis or second by second basis if possible. > >>>>> >>>>>> > >>>>> >>>>>> We are more than happy to pour through any raw data you might > >>>>> >>>>>> have that > >>>>> >>>>>> would help us calculate page requests at this granular level. > >>>>> >>>>>> Please > >>>>> >>>>>> let us > >>>>> >>>>>> know if it would be possible to get such data and if so how. > >>>>> >>>>>> Thank you > >>>>> >>>>>> in > >>>>> >>>>>> advance for your help. > >>>>> >>>>>> > >>>>> >>>>>> Best, > >>>>> >>>>>> > >>>>> >>>>>> Hirav Gandhi > >>>>> >>>>>> _______________________________________________ > >>>>> >>>>>> Analytics mailing list > >>>>> >>>>>> Analytics@lists.wikimedia.org > >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>>> > >>>>> >>>>>> -------------- next part -------------- > >>>>> >>>>>> An HTML attachment was scrubbed... > >>>>> >>>>>> URL: > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> < > https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html > > > >>>>> >>>>>> > >>>>> >>>>>> ------------------------------ > >>>>> >>>>>> > >>>>> >>>>>> Message: 2 > >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 > >>>>> >>>>>> From: Oliver Keyes <oke...@wikimedia.org> > >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and > everybody > >>>>> >>>>>> who > >>>>> >>>>>> has an interest in Wikipedia and analytics." > >>>>> >>>>>> <analytics@lists.wikimedia.org> > >>>>> >>>>>> Cc: Bharath Sitaraman <bharath1...@gmail.com> > >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than > >>>>> >>>>>> hourly > >>>>> >>>>>> basis > >>>>> >>>>>> Message-ID: > >>>>> >>>>>> > >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-= > h...@mail.gmail.com> > >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8 > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's > the > >>>>> >>>>>> director of analytics. > >>>>> >>>>>> > >>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually > >>>>> >>>>>> granular > >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just > temporally > >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the > >>>>> >>>>>> latter > >>>>> >>>>>> you've got more of a shot, I suspect. > >>>>> >>>>>> > >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <wiki.p...@gmail.com> > wrote: > >>>>> >>>>>> > >>>>> >>>>>> Hi, > >>>>> >>>>>> > >>>>> >>>>>> This issue of pageview data granularity has been discussed > >>>>> >>>>>> before, and > >>>>> >>>>>> the > >>>>> >>>>>> answer has been that hourly is the smallest increment allowed > to > >>>>> >>>>>> be > >>>>> >>>>>> revealed > >>>>> >>>>>> publicly, for privacy reasons. > >>>>> >>>>>> > >>>>> >>>>>> I believe that the person you will want to discuss your > request > >>>>> >>>>>> with is > >>>>> >>>>>> Toby, who I have cc'd here. > >>>>> >>>>>> > >>>>> >>>>>> Pine > >>>>> >>>>>> > >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" > >>>>> >>>>>> <hirav.gan...@gmail.com> > >>>>> >>>>>> wrote: > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> Hi Wikimedia Analytics Team, > >>>>> >>>>>> > >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic > server > >>>>> >>>>>> allocation > >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test > >>>>> >>>>>> our > >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an > >>>>> >>>>>> amazing data > >>>>> >>>>>> set > >>>>> >>>>>> of hourly page views, but we were looking for something a bit > >>>>> >>>>>> more > >>>>> >>>>>> granular, > >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a > >>>>> >>>>>> minute by > >>>>> >>>>>> minute > >>>>> >>>>>> basis or second by second basis if possible. > >>>>> >>>>>> > >>>>> >>>>>> We are more than happy to pour through any raw data you might > >>>>> >>>>>> have that > >>>>> >>>>>> would help us calculate page requests at this granular level. > >>>>> >>>>>> Please > >>>>> >>>>>> let us > >>>>> >>>>>> know if it would be possible to get such data and if so how. > >>>>> >>>>>> Thank you > >>>>> >>>>>> in > >>>>> >>>>>> advance for your help. > >>>>> >>>>>> > >>>>> >>>>>> Best, > >>>>> >>>>>> > >>>>> >>>>>> Hirav Gandhi > >>>>> >>>>>> _______________________________________________ > >>>>> >>>>>> Analytics mailing list > >>>>> >>>>>> Analytics@lists.wikimedia.org > >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> _______________________________________________ > >>>>> >>>>>> Analytics mailing list > >>>>> >>>>>> Analytics@lists.wikimedia.org > >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> -- > >>>>> >>>>>> Oliver Keyes > >>>>> >>>>>> Research Analyst > >>>>> >>>>>> Wikimedia Foundation > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> ------------------------------ > >>>>> >>>>>> > >>>>> >>>>>> _______________________________________________ > >>>>> >>>>>> Analytics mailing list > >>>>> >>>>>> Analytics@lists.wikimedia.org > >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21 > >>>>> >>>>>> ***************************************** > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> > >>>>> >>>>>> _______________________________________________ > >>>>> >>>>>> Analytics mailing list > >>>>> >>>>>> Analytics@lists.wikimedia.org > >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> > >>>>> >>>>> -- > >>>>> >>>>> Oliver Keyes > >>>>> >>>>> Research Analyst > >>>>> >>>>> Wikimedia Foundation > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> -- > >>>>> >>>> Oliver Keyes > >>>>> >>>> Research Analyst > >>>>> >>>> Wikimedia Foundation > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> ------------------------------ > >>>>> >>>> > >>>>> >>>> _______________________________________________ > >>>>> >>>> Analytics mailing list > >>>>> >>>> Analytics@lists.wikimedia.org > >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>> > >>>>> >>> > >>>>> >>> _______________________________________________ > >>>>> >>> Analytics mailing list > >>>>> >>> Analytics@lists.wikimedia.org > >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >>> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> -- > >>>>> >> Oliver Keyes > >>>>> >> Research Analyst > >>>>> >> Wikimedia Foundation > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> ------------------------------ > >>>>> >> > >>>>> >> _______________________________________________ > >>>>> >> Analytics mailing list > >>>>> >> Analytics@lists.wikimedia.org > >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> >> > >>>>> >> > >>>>> >> End of Analytics Digest, Vol 38, Issue 24 > >>>>> >> ***************************************** > >>>>> > > >>>>> > > >>>>> > _______________________________________________ > >>>>> > Analytics mailing list > >>>>> > Analytics@lists.wikimedia.org > >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Oliver Keyes > >>>>> Research Analyst > >>>>> Wikimedia Foundation > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Dario Taraborelli > >>>> Senior Research Scientist, Research and Data Lead > >>>> Wikimedia Foundation > >>>> http://wikimediafoundation.org > >>>> http://nitens.org/taraborelli > >>> > >>> > >>> > >>> > >>> -- > >>> Dario Taraborelli > >>> Senior Research Scientist, Research and Data Lead > >>> Wikimedia Foundation > >>> http://wikimediafoundation.org > >>> http://nitens.org/taraborelli > >> > >> > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Dario Taraborelli Senior Research Scientist, Research and Data Lead Wikimedia Foundation http://wikimediafoundation.org http://nitens.org/taraborelli
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics