Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-16 Thread Ogier Maitre
Hi Joseph,

I see. Obviously the process is more complex that I thought.

Thanks you for the help.
Regards,
Ogier

Le 15 mars 2021 à 18:09, Joseph Allemandou 
mailto:jalleman...@wikimedia.org>> a écrit :

Hi again Ogier,

> I don't exactly understand the part, about the page_id being defined in the 
> request. I thought the page_id was "resolved" based on the page_title being 
> in the uri_query.

This is not how the page_id is set in our traffic datasets :)
We receive the page_id in HTTP-Header, set by the UIs.
We have historically received the values for `desktop` and `mobile-web` pretty 
consistently, but the fact that we receive them for `mobile-app` is new to me :)
I assume that getting data consistently will then be a matter of mobile-app 
updates.

I hope this helps :)
Cheers
Joseph


On Mon, Mar 15, 2021 at 4:29 PM Ogier Maitre 
mailto:ogier.mai...@unil.ch>> wrote:
Hello Joseph,

Thank you for your detailed response.
We suspected curid could be part of the equation here, but it is nice to have 
it confirmed here (at least for a part of the answer).

> The entry appears two times because for one of them there is no page_id 
> defined in the request, therefore it is categorised as different from the one 
> having a page_id defined.

I don't exactly understand the part, about the page_id being defined in the 
request. I thought the page_id was "resolved" based on the page_title being in 
the uri_query.
But this is more to satisfy my curiosity has I'm currently bundling these 
entries with the one having a page_id, thanks to the page.sql table. I was 
mainly asking this, in hope to see these kind of entry disappear in the future, 
 which could simplify my aggregation process.

Thank you again for your answer.
Regards,
Ogier


Le 15 mars 2021 à 14:10, Joseph Allemandou 
mailto:jalleman...@wikimedia.org>> a écrit :

Hello Ogier,
Thank you a lot for the wikimaps work, and your thorough analysis on the 
pageviews :)

Here is what I found on your two questions, investigating one day of `user` 
visited pageviews recent data (we keep detailed data for 90 days only and I 
needed those detailed for the analysis).

> What kind of query can cause theses "-" entries ?
Pages with a defined page_id and an undefined title ('-') were representing 
0.04%, a bit more than 227k hits.
Among those, 152K requests were having a `curid=NUMBER` in their uri_query 
(meaning they were specifying the page to view only by id, and we don't extract 
page_title from ids).
More than 65K don't have any page-title nor page-id specified in the URLs, but 
have one specified in HTTP headers. This feels like either a bug or an 
unexpected user behavior.
And more than 10k are using a `diff=` uri pattern, providing diff between 
revisions for a given page, but not providing the page in the URL.
I also found, for mobile-app' cases, that some page-titles were incorrectly 
rejected as invalid for chinese wikipedia. This  happens on a very small number 
of lines (less than 10 per day from my findings).

> Why the entry "Barack_Obama mobile-app" appears two times ?
The entry appears two times because for one of them there is no page_id defined 
in the request, therefore it is categorised as different from the one having a 
page_id defined. While it could be possible to bundle all rows with the same 
title to have a page_id if one of the rows have the page_id defined, we could 
also have problems for hours where a rename occurs (two different page_ids for 
the same title). I'll bring the concern to the team, but given the relatively 
small number of views impacted by this case, there are chances we will not 
prioritise it soon.

Please let us know if you have other questions :)
Best
Joseph





On Sun, Mar 14, 2021 at 1:53 AM Dan Andreescu 
mailto:dandree...@wikimedia.org>> wrote:
Thank you for your email and thoughtful analysis, I just wanted to say I saw it 
but got buried with other work.  I'll try and reply early next week.

On Thu, Mar 11, 2021 at 03:50 Ogier Maitre 
mailto:ogier.mai...@unil.ch>> wrote:
Hello everybody,

We are currently working on a wikipedia visualisation tool (which is presented 
here: http://www.wikimaps.io/).  We use several pageview statistics to generate 
time series for each page from 2008 to 2020. (we use pagecounts, pageviews and 
pageview_complete). This last format is great for our work compared to previous 
format, and we use it for our data from 2016 to 2020. (Thank to the analytics 
team for that).

We aggregate redirections as one page, identified by the page_id (as it is done 
in the pageview_complete files).
But when we compare with the wikimedia API, we have some small differences.

I think this problem comes from the fact that wikimedia API (and 
pageviews.toolforge.org) uses page_title to 
get the time series, and I saw that pageview_complete files contain entries 
where the page_title is missing (replaced by a "-"). As we are using page_id to 
do the aggregation whenever 

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Joseph Allemandou
Hi again Ogier,

> I don't exactly understand the part, about the page_id being defined in
the request. I thought the page_id was "resolved" based on the page_title
being in the uri_query.

This is not how the page_id is set in our traffic datasets :)
We receive the page_id in HTTP-Header, set by the UIs.
We have historically received the values for `desktop` and `mobile-web`
pretty consistently, but the fact that we receive them for `mobile-app` is
new to me :)
I assume that getting data consistently will then be a matter of mobile-app
updates.

I hope this helps :)
Cheers
Joseph


On Mon, Mar 15, 2021 at 4:29 PM Ogier Maitre  wrote:

> Hello Joseph,
>
> Thank you for your detailed response.
> We suspected curid could be part of the equation here, but it is nice to
> have it confirmed here (at least for a part of the answer).
>
> > The entry appears two times because for one of them there is no page_id
> defined in the request, therefore it is categorised as different from the
> one having a page_id defined.
>
> I don't exactly understand the part, about the page_id being defined in
> the request. I thought the page_id was "resolved" based on the page_title
> being in the uri_query.
> But this is more to satisfy my curiosity has I'm currently bundling these
> entries with the one having a page_id, thanks to the page.sql table. I was
> mainly asking this, in hope to see these kind of entry disappear in the
> future,  which could simplify my aggregation process.
>
> Thank you again for your answer.
> Regards,
> Ogier
>
>
> Le 15 mars 2021 à 14:10, Joseph Allemandou  a
> écrit :
>
> Hello Ogier,
> Thank you a lot for the wikimaps work, and your thorough analysis on the
> pageviews :)
>
> Here is what I found on your two questions, investigating one day of
> `user` visited pageviews recent data (we keep detailed data for 90 days
> only and I needed those detailed for the analysis).
>
> > What kind of query can cause theses "-" entries ?
> Pages with a defined page_id and an undefined title ('-') were
> representing 0.04%, a bit more than 227k hits.
> Among those, 152K requests were having a `curid=NUMBER` in their uri_query
> (meaning they were specifying the page to view only by id, and we don't
> extract page_title from ids).
> More than 65K don't have any page-title nor page-id specified in the URLs,
> but have one specified in HTTP headers. This feels like either a bug or an
> unexpected user behavior.
> And more than 10k are using a `diff=` uri pattern, providing diff between
> revisions for a given page, but not providing the page in the URL.
> I also found, for mobile-app' cases, that some page-titles were
> incorrectly rejected as invalid for chinese wikipedia. This  happens on a
> very small number of lines (less than 10 per day from my findings).
>
> > Why the entry "Barack_Obama mobile-app" appears two times ?
> The entry appears two times because for one of them there is no page_id
> defined in the request, therefore it is categorised as different from the
> one having a page_id defined. While it could be possible to bundle all rows
> with the same title to have a page_id if one of the rows have the page_id
> defined, we could also have problems for hours where a rename occurs (two
> different page_ids for the same title). I'll bring the concern to the team,
> but given the relatively small number of views impacted by this case, there
> are chances we will not prioritise it soon.
>
> Please let us know if you have other questions :)
> Best
> Joseph
>
>
>
>
>
> On Sun, Mar 14, 2021 at 1:53 AM Dan Andreescu 
> wrote:
>
>> Thank you for your email and thoughtful analysis, I just wanted to say I
>> saw it but got buried with other work.  I'll try and reply early next week.
>>
>> On Thu, Mar 11, 2021 at 03:50 Ogier Maitre  wrote:
>>
>>> Hello everybody,
>>>
>>> We are currently working on a wikipedia visualisation tool (which is
>>> presented here: http://www.wikimaps.io/).  We use several pageview
>>> statistics to generate time series for each page from 2008 to 2020. (we use
>>> pagecounts, pageviews and pageview_complete). This last format is great for
>>> our work compared to previous format, and we use it for our data from 2016
>>> to 2020. (Thank to the analytics team for that).
>>>
>>> We aggregate redirections as one page, identified by the page_id (as it
>>> is done in the pageview_complete files).
>>> But when we compare with the wikimedia API, we have some small
>>> differences.
>>>
>>> I think this problem comes from the fact that wikimedia API (and
>>> pageviews.toolforge.org) uses page_title to get the time series, and I
>>> saw that pageview_complete files contain entries where the page_title is
>>> missing (replaced by a "-"). As we are using page_id to do the aggregation
>>> whenever it is possible, we aggregate these "-" entries, but
>>> pageviews.toolforge.org probably does not.
>>>
>>> For example for the page Barack_Obama in French, and the file
>>> `pageviews-20200112-user.bz2`, 

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Ogier Maitre
Hello Joseph,

Thank you for your detailed response.
We suspected curid could be part of the equation here, but it is nice to have 
it confirmed here (at least for a part of the answer).

> The entry appears two times because for one of them there is no page_id 
> defined in the request, therefore it is categorised as different from the one 
> having a page_id defined.

I don't exactly understand the part, about the page_id being defined in the 
request. I thought the page_id was "resolved" based on the page_title being in 
the uri_query.
But this is more to satisfy my curiosity has I'm currently bundling these 
entries with the one having a page_id, thanks to the page.sql table. I was 
mainly asking this, in hope to see these kind of entry disappear in the future, 
 which could simplify my aggregation process.

Thank you again for your answer.
Regards,
Ogier


Le 15 mars 2021 à 14:10, Joseph Allemandou 
mailto:jalleman...@wikimedia.org>> a écrit :

Hello Ogier,
Thank you a lot for the wikimaps work, and your thorough analysis on the 
pageviews :)

Here is what I found on your two questions, investigating one day of `user` 
visited pageviews recent data (we keep detailed data for 90 days only and I 
needed those detailed for the analysis).

> What kind of query can cause theses "-" entries ?
Pages with a defined page_id and an undefined title ('-') were representing 
0.04%, a bit more than 227k hits.
Among those, 152K requests were having a `curid=NUMBER` in their uri_query 
(meaning they were specifying the page to view only by id, and we don't extract 
page_title from ids).
More than 65K don't have any page-title nor page-id specified in the URLs, but 
have one specified in HTTP headers. This feels like either a bug or an 
unexpected user behavior.
And more than 10k are using a `diff=` uri pattern, providing diff between 
revisions for a given page, but not providing the page in the URL.
I also found, for mobile-app' cases, that some page-titles were incorrectly 
rejected as invalid for chinese wikipedia. This  happens on a very small number 
of lines (less than 10 per day from my findings).

> Why the entry "Barack_Obama mobile-app" appears two times ?
The entry appears two times because for one of them there is no page_id defined 
in the request, therefore it is categorised as different from the one having a 
page_id defined. While it could be possible to bundle all rows with the same 
title to have a page_id if one of the rows have the page_id defined, we could 
also have problems for hours where a rename occurs (two different page_ids for 
the same title). I'll bring the concern to the team, but given the relatively 
small number of views impacted by this case, there are chances we will not 
prioritise it soon.

Please let us know if you have other questions :)
Best
Joseph





On Sun, Mar 14, 2021 at 1:53 AM Dan Andreescu 
mailto:dandree...@wikimedia.org>> wrote:
Thank you for your email and thoughtful analysis, I just wanted to say I saw it 
but got buried with other work.  I'll try and reply early next week.

On Thu, Mar 11, 2021 at 03:50 Ogier Maitre 
mailto:ogier.mai...@unil.ch>> wrote:
Hello everybody,

We are currently working on a wikipedia visualisation tool (which is presented 
here: http://www.wikimaps.io/).  We use several pageview statistics to generate 
time series for each page from 2008 to 2020. (we use pagecounts, pageviews and 
pageview_complete). This last format is great for our work compared to previous 
format, and we use it for our data from 2016 to 2020. (Thank to the analytics 
team for that).

We aggregate redirections as one page, identified by the page_id (as it is done 
in the pageview_complete files).
But when we compare with the wikimedia API, we have some small differences.

I think this problem comes from the fact that wikimedia API (and 
pageviews.toolforge.org) uses page_title to 
get the time series, and I saw that pageview_complete files contain entries 
where the page_title is missing (replaced by a "-"). As we are using page_id to 
do the aggregation whenever it is possible, we aggregate these "-" entries, but 
pageviews.toolforge.org probably does not.

For example for the page Barack_Obama in French, and the file 
`pageviews-20200112-user.bz2`, I get several relevant entries.


fr.wikipedia - 167398 mobile-web 1 B1
fr.wikipedia Barack 167398 mobile-web 1 X1
fr.wikipedia Barack_Hussein_Obama 167398 mobile-web 1 J1
fr.wikipedia Barack_Obama 167398 desktop 748 
A18B10C5D8E3F3G8H6I18J36K41L37M35N37O55P76Q65R57S48T29U56V42W23X32
fr.wikipedia Barack_Obama 167398 mobile-app 10 A1L1O1Q1T3U2V1
fr.wikipedia Barack_Obama 167398 mobile-web 1732 
A62B38C28D17E24F10G16H43I40J56K65L78M87N100O95P100Q93R127S84T128U124V184W84X49
fr.wikipedia Natasha_Obama 167398 desktop 3 Q1R2
fr.wikipedia Obama 167398 desktop 11 J2K1M1O1Q2R1S1U1W1
fr.wikipedia Obama 167398 mobile-web 2 R1V1
fr.wikipedia Obama_Barack 167398 

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-15 Thread Joseph Allemandou
Hello Ogier,
Thank you a lot for the wikimaps work, and your thorough analysis on the
pageviews :)

Here is what I found on your two questions, investigating one day of `user`
visited pageviews recent data (we keep detailed data for 90 days only and I
needed those detailed for the analysis).

> What kind of query can cause theses "-" entries ?
Pages with a defined page_id and an undefined title ('-') were representing
0.04%, a bit more than 227k hits.
Among those, 152K requests were having a `curid=NUMBER` in their uri_query
(meaning they were specifying the page to view only by id, and we don't
extract page_title from ids).
More than 65K don't have any page-title nor page-id specified in the URLs,
but have one specified in HTTP headers. This feels like either a bug or an
unexpected user behavior.
And more than 10k are using a `diff=` uri pattern, providing diff between
revisions for a given page, but not providing the page in the URL.
I also found, for mobile-app' cases, that some page-titles were incorrectly
rejected as invalid for chinese wikipedia. This  happens on a very small
number of lines (less than 10 per day from my findings).

> Why the entry "Barack_Obama mobile-app" appears two times ?
The entry appears two times because for one of them there is no page_id
defined in the request, therefore it is categorised as different from the
one having a page_id defined. While it could be possible to bundle all rows
with the same title to have a page_id if one of the rows have the page_id
defined, we could also have problems for hours where a rename occurs (two
different page_ids for the same title). I'll bring the concern to the team,
but given the relatively small number of views impacted by this case, there
are chances we will not prioritise it soon.

Please let us know if you have other questions :)
Best
Joseph





On Sun, Mar 14, 2021 at 1:53 AM Dan Andreescu 
wrote:

> Thank you for your email and thoughtful analysis, I just wanted to say I
> saw it but got buried with other work.  I'll try and reply early next week.
>
> On Thu, Mar 11, 2021 at 03:50 Ogier Maitre  wrote:
>
>> Hello everybody,
>>
>> We are currently working on a wikipedia visualisation tool (which is
>> presented here: http://www.wikimaps.io/).  We use several pageview
>> statistics to generate time series for each page from 2008 to 2020. (we use
>> pagecounts, pageviews and pageview_complete). This last format is great for
>> our work compared to previous format, and we use it for our data from 2016
>> to 2020. (Thank to the analytics team for that).
>>
>> We aggregate redirections as one page, identified by the page_id (as it
>> is done in the pageview_complete files).
>> But when we compare with the wikimedia API, we have some small
>> differences.
>>
>> I think this problem comes from the fact that wikimedia API (and
>> pageviews.toolforge.org) uses page_title to get the time series, and I
>> saw that pageview_complete files contain entries where the page_title is
>> missing (replaced by a "-"). As we are using page_id to do the aggregation
>> whenever it is possible, we aggregate these "-" entries, but
>> pageviews.toolforge.org probably does not.
>>
>> For example for the page Barack_Obama in French, and the file
>> `pageviews-20200112-user.bz2`, I get several relevant entries.
>>
>>
>> fr.wikipedia - 167398 mobile-web 1 B1
>> fr.wikipedia Barack 167398 mobile-web 1 X1
>> fr.wikipedia Barack_Hussein_Obama 167398 mobile-web 1 J1
>> fr.wikipedia Barack_Obama 167398 desktop 748
>> A18B10C5D8E3F3G8H6I18J36K41L37M35N37O55P76Q65R57S48T29U56V42W23X32
>> fr.wikipedia Barack_Obama 167398 mobile-app 10 A1L1O1Q1T3U2V1
>> fr.wikipedia Barack_Obama 167398 mobile-web 1732
>> A62B38C28D17E24F10G16H43I40J56K65L78M87N100O95P100Q93R127S84T128U124V184W84X49
>> fr.wikipedia Natasha_Obama 167398 desktop 3 Q1R2
>> fr.wikipedia Obama 167398 desktop 11 J2K1M1O1Q2R1S1U1W1
>> fr.wikipedia Obama 167398 mobile-web 2 R1V1
>> fr.wikipedia Obama_Barack 167398 desktop 3 N1P2
>> fr.wikipedia Sacha_Obama 167398 desktop 3 J1O2
>> fr.wikipedia Sacha_Obama 167398 mobile-web 1 C1
>>
>> fr.wikipedia Barack_Obama mobile-app 29 B1C1H4J1L1M2N3O3P1R3S5V1W2X1
>>
>>
>> That is 12 entries that use the page_id, and one that does not.
>>
>> I have two questions about that result.
>>
>> What kind of query can cause theses "-" entries ?
>> Why the entry "Barack_Obama mobile-app" appears two times ?
>>
>> Sorry for the long introduction and thank you for your time.
>>
>> Regards,
>> Ogier
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>


-- 
Joseph Allemandou (joal) (he / him)
Staff Data Engineer
Wikimedia Foundation
___
Analytics mailing list

Re: [Analytics] Pageview-complete entries labeled as "-"

2021-03-13 Thread Dan Andreescu
Thank you for your email and thoughtful analysis, I just wanted to say I
saw it but got buried with other work.  I'll try and reply early next week.

On Thu, Mar 11, 2021 at 03:50 Ogier Maitre  wrote:

> Hello everybody,
>
> We are currently working on a wikipedia visualisation tool (which is
> presented here: http://www.wikimaps.io/).  We use several pageview
> statistics to generate time series for each page from 2008 to 2020. (we use
> pagecounts, pageviews and pageview_complete). This last format is great for
> our work compared to previous format, and we use it for our data from 2016
> to 2020. (Thank to the analytics team for that).
>
> We aggregate redirections as one page, identified by the page_id (as it is
> done in the pageview_complete files).
> But when we compare with the wikimedia API, we have some small
> differences.
>
> I think this problem comes from the fact that wikimedia API (and
> pageviews.toolforge.org) uses page_title to get the time series, and I
> saw that pageview_complete files contain entries where the page_title is
> missing (replaced by a "-"). As we are using page_id to do the aggregation
> whenever it is possible, we aggregate these "-" entries, but
> pageviews.toolforge.org probably does not.
>
> For example for the page Barack_Obama in French, and the file
> `pageviews-20200112-user.bz2`, I get several relevant entries.
>
>
> fr.wikipedia - 167398 mobile-web 1 B1
> fr.wikipedia Barack 167398 mobile-web 1 X1
> fr.wikipedia Barack_Hussein_Obama 167398 mobile-web 1 J1
> fr.wikipedia Barack_Obama 167398 desktop 748
> A18B10C5D8E3F3G8H6I18J36K41L37M35N37O55P76Q65R57S48T29U56V42W23X32
> fr.wikipedia Barack_Obama 167398 mobile-app 10 A1L1O1Q1T3U2V1
> fr.wikipedia Barack_Obama 167398 mobile-web 1732
> A62B38C28D17E24F10G16H43I40J56K65L78M87N100O95P100Q93R127S84T128U124V184W84X49
> fr.wikipedia Natasha_Obama 167398 desktop 3 Q1R2
> fr.wikipedia Obama 167398 desktop 11 J2K1M1O1Q2R1S1U1W1
> fr.wikipedia Obama 167398 mobile-web 2 R1V1
> fr.wikipedia Obama_Barack 167398 desktop 3 N1P2
> fr.wikipedia Sacha_Obama 167398 desktop 3 J1O2
> fr.wikipedia Sacha_Obama 167398 mobile-web 1 C1
>
> fr.wikipedia Barack_Obama mobile-app 29 B1C1H4J1L1M2N3O3P1R3S5V1W2X1
>
>
> That is 12 entries that use the page_id, and one that does not.
>
> I have two questions about that result.
>
> What kind of query can cause theses "-" entries ?
> Why the entry "Barack_Obama mobile-app" appears two times ?
>
> Sorry for the long introduction and thank you for your time.
>
> Regards,
> Ogier
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview Dumps outage?

2021-01-08 Thread Dan Andreescu
Marcus: we're also refactoring this dataset to be much more convenient
(much smaller with more data and all the history).  Stay tuned here or
follow the progress as we update:
https://dumps.wikimedia.org/other/pageview_complete/

On Fri, Jan 8, 2021 at 3:04 PM Luca Toscano  wrote:

> Hi Marcus,
>
> We are in the process of restoring the regular dumps workflow, more
> details in https://phabricator.wikimedia.org/T271362
> We should be able to be back in full service during this weekend, sorry
> for the inconvenience!
>
> Thanks,
>
> Luca (on behalf of the Analytics team)
>
> On Fri, Jan 8, 2021 at 8:44 PM Marcus Schorow 
> wrote:
>
>> Hi all,
>> I collect views for some terms for a project and noticed the dumps on
>> https://dumps.wikimedia.org/other/pageviews/2021/2021-01/ haven't been
>> updating since January 6th. Is this a known outage? Is there anywhere I can
>> check to see the status of the systems that create and dump the view
>> counts?
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview Dumps outage?

2021-01-08 Thread Luca Toscano
Hi Marcus,

We are in the process of restoring the regular dumps workflow, more details
in https://phabricator.wikimedia.org/T271362
We should be able to be back in full service during this weekend, sorry for
the inconvenience!

Thanks,

Luca (on behalf of the Analytics team)

On Fri, Jan 8, 2021 at 8:44 PM Marcus Schorow 
wrote:

> Hi all,
> I collect views for some terms for a project and noticed the dumps on
> https://dumps.wikimedia.org/other/pageviews/2021/2021-01/ haven't been
> updating since January 6th. Is this a known outage? Is there anywhere I can
> check to see the status of the systems that create and dump the view
> counts?
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] PageView

2018-03-02 Thread Nuria Ruiz
>Or is there another method you also count that is gathered for other
companies that collect views?
Companies that do this such us comScore do it by getting their participants
install (normally desktop software) in their machines and tracking page
views that these participants do. It was the case until recently that
comScore only would use desktop statistics (that seems to agree with your
findings) when reporting, for example, unique users. Because of this fact,
their numbers (that did not included mobile usage) were largely incorrect.

Please see:
https://meta.wikimedia.org/wiki/ComScore/Announcement

On Fri, Mar 2, 2018 at 8:41 AM, Marcel Ruiz Forns 
wrote:

> Sorry, forwarding to Analytics...
>
> Hi Angelina,
>
> I don't think there's any (legal) way of tracking Wikipedia traffic.
> All Wikipedia traffic data is protected by WMF's privacy policy[1]
> and handled accordingly.
>
> We do, however, provide public sanitized high-level statistics on page
> views for Wikipedia in various ways (not to specific companies or
> organizations, but rather to the world at large). What "Next Big Sound"
> is probably doing, is consuming one of those public sources, but we
> don't know which one.
>
> These are 2 of the main sources this company might be grabbing stats from:
> https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
> https://dumps.wikimedia.org/
>
> Cheers!
>
> [1] https://wikimediafoundation.org/wiki/Privacy_policy
>
>
>
> On Fri, Mar 2, 2018 at 5:16 PM, Marcel Ruiz Forns 
> wrote:
>
>> Hi Angelina,
>>
>> I don't think there's any (legal) way of tracking Wikipedia traffic.
>> All Wikipedia traffic data is protected by WMF's privacy policy[1]
>> and handled accordingly.
>>
>> We do, however, provide public sanitized high-level statistics on page
>> views for Wikipedia in various ways (not to specific companies or
>> organizations, but rather to the world at large). What "Next Big Sound"
>> is probably doing, is consuming one of those public sources, but we
>> don't know which one.
>>
>> These are 2 of the main sources this company might be grabbing stats from:
>> https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
>> https://dumps.wikimedia.org/
>>
>> Cheers!
>>
>> [1] https://wikimediafoundation.org/wiki/Privacy_policy
>>
>>
>> On Fri, Mar 2, 2018 at 4:19 PM, Marcel Ruiz Forns 
>> wrote:
>>
>>> Oh, forgot the subscribe link, here:
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>> Cheers!
>>>
>>> On Fri, Mar 2, 2018 at 4:18 PM, Marcel Ruiz Forns 
>>> wrote:
>>>
 Hi Angelina,

 I'm the administrator of this mailing-list. Just to let you know that
 your email was automatically filtered out by the mailing-list bot because
 your address is not subscribed to it. I just unblocked it, so yopu will
 receive a response in short. However, please subscribe to send further
 emails to the list.

 Thanks!


 On Wed, Feb 28, 2018 at 5:04 PM, BTShasSTOLENmyHEART <
 zangeli...@gmail.com> wrote:

> Hello,
>
> I recently spoke with "Next Big Sound" which is a company that tracks
> Wikipedia page views on certain artists. They informed me that they got
> details of the views directly from Wikipedia (because I had emailed them
> that the View counts mentioned on Wikipedia and Next Big Sound show a 
> major
> discrepancy). There are rumors flying about saying that the information
> only gathered is from Desktop Views, in which the counts are extremely
> similar. Is there any way you can confirm this as true? Or is there 
> another
> method you also count that is gathered for other companies that collect
> views? I know you have no idea of what Next Big Sound is presenting to the
> world wide audience, but I wanted to know if you can explain what
> information is given to Next Big Sound in terms of data. Thank you
>
>
> Sincerely,
>
> Angelina Zamora
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


 --
 *Marcel Ruiz Forns*
 Analytics Developer
 Wikimedia Foundation

>>>
>>>
>>>
>>> --
>>> *Marcel Ruiz Forns*
>>> Analytics Developer
>>> Wikimedia Foundation
>>>
>>
>>
>>
>> --
>> *Marcel Ruiz Forns*
>> Analytics Developer
>> Wikimedia Foundation
>>
>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] PageView

2018-03-02 Thread Marcel Ruiz Forns
Sorry, forwarding to Analytics...

Hi Angelina,

I don't think there's any (legal) way of tracking Wikipedia traffic.
All Wikipedia traffic data is protected by WMF's privacy policy[1]
and handled accordingly.

We do, however, provide public sanitized high-level statistics on page
views for Wikipedia in various ways (not to specific companies or
organizations, but rather to the world at large). What "Next Big Sound"
is probably doing, is consuming one of those public sources, but we
don't know which one.

These are 2 of the main sources this company might be grabbing stats from:
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
https://dumps.wikimedia.org/

Cheers!

[1] https://wikimediafoundation.org/wiki/Privacy_policy



On Fri, Mar 2, 2018 at 5:16 PM, Marcel Ruiz Forns 
wrote:

> Hi Angelina,
>
> I don't think there's any (legal) way of tracking Wikipedia traffic.
> All Wikipedia traffic data is protected by WMF's privacy policy[1]
> and handled accordingly.
>
> We do, however, provide public sanitized high-level statistics on page
> views for Wikipedia in various ways (not to specific companies or
> organizations, but rather to the world at large). What "Next Big Sound"
> is probably doing, is consuming one of those public sources, but we
> don't know which one.
>
> These are 2 of the main sources this company might be grabbing stats from:
> https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews
> https://dumps.wikimedia.org/
>
> Cheers!
>
> [1] https://wikimediafoundation.org/wiki/Privacy_policy
>
>
> On Fri, Mar 2, 2018 at 4:19 PM, Marcel Ruiz Forns 
> wrote:
>
>> Oh, forgot the subscribe link, here:
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> Cheers!
>>
>> On Fri, Mar 2, 2018 at 4:18 PM, Marcel Ruiz Forns 
>> wrote:
>>
>>> Hi Angelina,
>>>
>>> I'm the administrator of this mailing-list. Just to let you know that
>>> your email was automatically filtered out by the mailing-list bot because
>>> your address is not subscribed to it. I just unblocked it, so yopu will
>>> receive a response in short. However, please subscribe to send further
>>> emails to the list.
>>>
>>> Thanks!
>>>
>>>
>>> On Wed, Feb 28, 2018 at 5:04 PM, BTShasSTOLENmyHEART <
>>> zangeli...@gmail.com> wrote:
>>>
 Hello,

 I recently spoke with "Next Big Sound" which is a company that tracks
 Wikipedia page views on certain artists. They informed me that they got
 details of the views directly from Wikipedia (because I had emailed them
 that the View counts mentioned on Wikipedia and Next Big Sound show a major
 discrepancy). There are rumors flying about saying that the information
 only gathered is from Desktop Views, in which the counts are extremely
 similar. Is there any way you can confirm this as true? Or is there another
 method you also count that is gathered for other companies that collect
 views? I know you have no idea of what Next Big Sound is presenting to the
 world wide audience, but I wanted to know if you can explain what
 information is given to Next Big Sound in terms of data. Thank you


 Sincerely,

 Angelina Zamora

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>>
>>> --
>>> *Marcel Ruiz Forns*
>>> Analytics Developer
>>> Wikimedia Foundation
>>>
>>
>>
>>
>> --
>> *Marcel Ruiz Forns*
>> Analytics Developer
>> Wikimedia Foundation
>>
>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
>



-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview dumps lagging behind

2018-02-20 Thread Marcel Ruiz Forns
Hi John!

We send all our scheduled maintenance notifications to this mailing list.
You can subscribe here:
https://lists.wikimedia.org/mailman/listinfo/analytics

Cheers!

On Fri, Feb 16, 2018 at 5:48 PM, John Urbanik  wrote:

> Our team would greatly appreciate scheduled maintenance notifications for
> maintenance that would impact analytics services. Perhaps an additional
> list can be set up?
>
> On Fri, Feb 16, 2018 at 11:45 AM, Dan Andreescu 
> wrote:
>
>> Oh, my fault, this message is from a while back.  We had to pause the
>> cluster for a few days to do a big upgrade, now everything is operational
>> and you should be seeing data and dumps usually within 24 hours of when
>> you'd expect them.  If that's not the case, either we're performing some
>> scheduled maintenance or something could be wrong (but that rarely happens
>> and we usually announce it here if it does).  Going forward, would people
>> on this list like to be notified of scheduled maintenance?  It might be
>> spam for most people so we usually don't post a message about it.
>>
>> On Fri, Feb 16, 2018 at 11:43 AM, Dan Andreescu > > wrote:
>>
>>> Hi, how are you deducing that, I show files up to 2018-02-16 14:00:00
>>> (UTC) which is very up to date, only a few hours ago.
>>>
>>> On Sun, Feb 11, 2018 at 4:57 AM, Spinner Cat  wrote:
>>>
 Hi all,

 Noticed that we're not getting any new pageview dumps on
 https://dumps.wikimedia.org/other/pageviews/2018/2018-02/ since Feb
 9th 17:08 UTC. Is this a known issue and when might we expect it to be
 resolved and files to catch up again?

 Thanks!

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
>
> *JOHN URBANIK*
> Lead Data Engineer
>
> jurba...@predata.com
> 860.878.1010 <(860)%20878-1010>
> 379 West Broadway
> New York, NY 10012
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview dumps lagging behind

2018-02-16 Thread Dan Andreescu
Oh, my fault, this message is from a while back.  We had to pause the
cluster for a few days to do a big upgrade, now everything is operational
and you should be seeing data and dumps usually within 24 hours of when
you'd expect them.  If that's not the case, either we're performing some
scheduled maintenance or something could be wrong (but that rarely happens
and we usually announce it here if it does).  Going forward, would people
on this list like to be notified of scheduled maintenance?  It might be
spam for most people so we usually don't post a message about it.

On Fri, Feb 16, 2018 at 11:43 AM, Dan Andreescu 
wrote:

> Hi, how are you deducing that, I show files up to 2018-02-16 14:00:00
> (UTC) which is very up to date, only a few hours ago.
>
> On Sun, Feb 11, 2018 at 4:57 AM, Spinner Cat  wrote:
>
>> Hi all,
>>
>> Noticed that we're not getting any new pageview dumps on
>> https://dumps.wikimedia.org/other/pageviews/2018/2018-02/ since Feb
>> 9th 17:08 UTC. Is this a known issue and when might we expect it to be
>> resolved and files to catch up again?
>>
>> Thanks!
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API with country?

2016-09-19 Thread Jonathan Van Parys
Thank you very much for your answer.

> On 13 Sep 2016, at 15:59, Nuria Ruiz  wrote:
> 
> Jonathan, 
> 
> Do send your questions to analytics@ to get a better/faster response. 
> 
> >I recently discovered the wonderful Wikimedia Analytics/Pageview API and was 
> >wondering whether there are any plans to extend it to include pageviews by 
> >>country? It would be a great way to find out what people are interested in 
> >in a specific country.
> No, there are no plans to this effect as data needs to be sanitized before we 
> can do this. We release pageview country reports on another format here: 
> https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOverview.htm
>  
> 
> 
> Thanks,
> 
> Nuria
> 
> 
> On Tue, Sep 13, 2016 at 4:04 AM, Jonathan Van Parys  > wrote:
> Dear Nuria,
> 
> I recently discovered the wonderful Wikimedia Analytics/Pageview API and was 
> wondering whether there are any plans to extend it to include pageviews by 
> country? It would be a great way to find out what people are interested in in 
> a specific country.
> 
> Many thanks in advance,
> 
> Jonathan
> 

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API with country?

2016-09-13 Thread Nuria Ruiz
Jonathan,

Do send your questions to analytics@ to get a better/faster response.

>I recently discovered the wonderful Wikimedia Analytics/Pageview API and
was wondering whether there are any plans to extend it to include pageviews
by >country? It would be a great way to find out what people are interested
in in a specific country.
No, there are no plans to this effect as data needs to be sanitized before
we can do this. We release pageview country reports on another format here:
https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOverview.htm

Thanks,

Nuria


On Tue, Sep 13, 2016 at 4:04 AM, Jonathan Van Parys 
wrote:

> Dear Nuria,
>
> I recently discovered the wonderful Wikimedia Analytics/Pageview API and
> was wondering whether there are any plans to extend it to include pageviews
> by country? It would be a great way to find out what people are interested
> in in a specific country.
>
> Many thanks in advance,
>
> Jonathan
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageview counts on page redirects

2016-08-29 Thread Timo Tijhof
On 25 August 2016 at 05:25, Aubrey Rembert  wrote:

> hi,
>
> our team is trying to determine how pageviews are attributed to pages that
> redirect to other pages.
>
> for instance, the page Panic!_at_the_*d*isco redirects to the page
> Panic!_at_the_*D*isco, however, in the pageview dumps file
> there is an entry for both Panic!_at_the_disco and Panic!_at_the_Disco.
> does this mean that a single visit to the page Panic!_at_the_disco
> generates two entries
> in the pageview dumps file (one entry for the source page of the redirect
> and another for the target page of the redirect)?
>
>

Dario's reply covers it but just wanted to elaborate a tiny bit:

For traditional web redirects your assumption would be true. There, a
request for a redirect url would get a 30x HTTP response with a pointer to
the target url. In which case the web browser will make a subsequent
request for the target url so that it can show the desired page.

However wiki redirects don't work that way. A request for a redirect wiki
page immediately results in a response for the target article (with a
little "Redirect from" caption under the title). So there'll be only 1 web
request, 1 response, and 1 page view, logged under the redirect name.

-- Timo
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageview counts on page redirects

2016-08-27 Thread Dario Taraborelli
Pageview data (both in the dumps and pageviews API) is counted for the
nominal page title as requested, i.e. it is agnostic as to what that title
redirects to.

To obtain a complete dataset of pageviews across all redirects for a given
page you would need to reconstruct its redirect graph over the time range
you're interested in, which is a pretty laborious process.

If you're doing research on this topic, you may be interested in this
recent work by Mako Hill and Aaron Shaw looking at redirects and how they
affect the quality of data on Wikipedia articles.

*Consider the Redirect: A Missing Dimension of Wikipedia Research*
http://dx.doi.org/10.1145/2641580.2641616

HTH,
Dario



On Thu, Aug 25, 2016 at 5:25 AM, Aubrey Rembert 
wrote:

> hi,
>
> our team is trying to determine how pageviews are attributed to pages that
> redirect to other pages.
>
> for instance, the page Panic!_at_the_*d*isco redirects to the page
> Panic!_at_the_*D*isco, however, in the pageview dumps file
> there is an entry for both Panic!_at_the_disco and Panic!_at_the_Disco.
> does this mean that a single visit to the page Panic!_at_the_disco
> generates two entries
> in the pageview dumps file (one entry for the source page of the redirect
> and another for the target page of the redirect)?
>
> -best,
> -ace
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageview types from the new pageviews data set

2016-08-17 Thread Dan Andreescu
The answer is a bit confusing, so I just spent a few hours updating the
documentation of our datasets:
https://wikitech.wikimedia.org/wiki/Analytics/Data.  The link for the
pageviews dataset is now:
https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews

To answer your question directly, .q is actually the project abbreviation
for wikiquote.  This is lightly inspired by the syntax for inter-wiki
links, but it's explained in detail in this section:

https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews#Contained_data

The .m usually means access via the mobile site, unless .m.m appears in
which case it could be something else, that's also described here in detail:

https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews#Disambiguating_abbreviations_ending_in_.E2.80.9C.m.E2.80.9D

If you're still confused after reading that and trying it out, let us know.

On Mon, Aug 15, 2016 at 12:54 PM, Aubrey Rembert 
wrote:

> our team is trying to distiguish mobile pageviews from non-mobile
> pageviews in the new data feed https://dumps.wikimedia.
> org/other/pageviews/.
> we are under the impression that the .m and .zero page view type
> extensions are page views of the mobile site. is this correct?
> what does the extension .q mean? also, can we safely assume that if a page
> view has a language but no extension (.m, .zero, .q, etc…), then it is a
> view of the desktop site?
>
> thanks!
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Toby Negrin
Just curious -- how much would it cost to make all of the data available at
a daily granularity for a year?

On Fri, Jul 29, 2016 at 4:30 PM, Jonathan Morgan 
wrote:

> Hi Dan,
>
> Making dumps much easier to use would definitely help. We Wikipedia
> researchers are kind of spoiled: we have easy public access to historical
> revision data for all projects, going back to 2001, through the API *and*
> public db endpoints like Quarry. It's only natural that we want the same
> thing with pageviews!!! :)
>
> I can think of other use-cases for keeping more than 18 months of data
> available through the API, but they're all research use cases. I don't
> think having lower-granularity historical data available beyond a certain
> point is helpful for those--if you're doing historical analysis, you want
> consistency. But a application that parsed dumps on the server-side to
> yield historical data (ideally in a format and granularity that wasn't
> fundamentally different from that of the API, so you could join the
> streams) would definitely be useful, and would probably address most
> research needs I can think of, inside and outside the Foundation.
>
> Thanks for asking,
> Jonathan
>
> On Fri, Jul 29, 2016 at 12:27 PM, Dan Andreescu 
> wrote:
>
>> Amir and Jonathan - thanks for speaking up for the "more than 18 months"
>> use cases.  If dumps were *much* easier to use (via python clients that
>> made it transparent whether you were hitting the API or not), would that be
>> an acceptable solution?  I feel like both of your use cases are not things
>> that will be happening on a daily basis.  If that's true, another solution
>> would be an ad-hoc API that took in a filter and a date range, applied it
>> server-side, and gave you a partial dump with only the interesting data.
>> If this didn't happen very often, it would allow us to trade processing
>> time and a bit of dev time for more expensive storage.
>>
>> Or, if we end up needing frequent access to old data, we should be able
>> to justify spending more money on more servers.  Just trying to save as
>> much money as possible :)
>>
>> Thanks all so far, please feel free to keep chiming in if you have other
>> use cases that haven't been covered, or if you'd like to add more weight
>> behind the "more than 18 months" use cases.
>>
>> On Fri, Jul 29, 2016 at 3:18 PM, Leila Zia  wrote:
>>
>>> Dan, Thanks for reaching out.
>>>
>>> 18 months is enough for my use cases as long as the dumps capture the
>>> exact data structure.
>>>
>>> Best,
>>> Leila
>>>
>>> --
>>> Leila Zia
>>> Senior Research Scientist
>>> Wikimedia Foundation
>>>
>>> On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni <
>>> amir.ahar...@mail.huji.ac.il> wrote:
>>>
 I am now checking traffic data every day to see whether Compact
 Language Links affect it. It makes sense to compare them not only to the
 previous week, but also to the same month previous year. So one year is not
 hardly enough. 18 months is better, and three years is much better because
 I'll be able to check also the same month in earlier years.

 I imagine that this may be useful to all product managers that work on
 features that can affect traffic.

 בתאריך 29 ביולי 2016 15:41,‏ "Dan Andreescu" 
 כתב:

> Dear Pageview API consumers,
>
> We would like to plan storage capacity for our pageview API cluster.
> Right now, with a reliable RAID setup, we can keep *18 months* of
> data.  If you'd like to query further back than that, you can download 
> dump
> files (which we'll make easier to use with python utilities).
>
> What do you think?  Will you need more than 18 months of data?  If so,
> we need to add more nodes when we get to that point, and that costs money,
> so we want to check if there is a real need for it.
>
> Another option is to start degrading the resolution for older data
> (only keep weekly or monthly for data older than 1 year for example).  If
> you need more than 18 months, we'd love to hear your use case and 
> something
> in the form of:
>
> need daily resolution for 1 year
> need weekly resolution for 2 years
> need monthly resolution for 3 years
>
> Thank you!
>
> Dan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> 

Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Leila Zia
Dan, Thanks for reaching out.

18 months is enough for my use cases as long as the dumps capture the exact
data structure.

Best,
Leila

--
Leila Zia
Senior Research Scientist
Wikimedia Foundation

On Fri, Jul 29, 2016 at 11:51 AM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> I am now checking traffic data every day to see whether Compact Language
> Links affect it. It makes sense to compare them not only to the previous
> week, but also to the same month previous year. So one year is not hardly
> enough. 18 months is better, and three years is much better because I'll be
> able to check also the same month in earlier years.
>
> I imagine that this may be useful to all product managers that work on
> features that can affect traffic.
>
> בתאריך 29 ביולי 2016 15:41,‏ "Dan Andreescu" 
> כתב:
>
>> Dear Pageview API consumers,
>>
>> We would like to plan storage capacity for our pageview API cluster.
>> Right now, with a reliable RAID setup, we can keep *18 months* of data.
>> If you'd like to query further back than that, you can download dump files
>> (which we'll make easier to use with python utilities).
>>
>> What do you think?  Will you need more than 18 months of data?  If so, we
>> need to add more nodes when we get to that point, and that costs money, so
>> we want to check if there is a real need for it.
>>
>> Another option is to start degrading the resolution for older data (only
>> keep weekly or monthly for data older than 1 year for example).  If you
>> need more than 18 months, we'd love to hear your use case and something in
>> the form of:
>>
>> need daily resolution for 1 year
>> need weekly resolution for 2 years
>> need monthly resolution for 3 years
>>
>> Thank you!
>>
>> Dan
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Amir E. Aharoni
I am now checking traffic data every day to see whether Compact Language
Links affect it. It makes sense to compare them not only to the previous
week, but also to the same month previous year. So one year is not hardly
enough. 18 months is better, and three years is much better because I'll be
able to check also the same month in earlier years.

I imagine that this may be useful to all product managers that work on
features that can affect traffic.

בתאריך 29 ביולי 2016 15:41,‏ "Dan Andreescu"  כתב:

> Dear Pageview API consumers,
>
> We would like to plan storage capacity for our pageview API cluster.
> Right now, with a reliable RAID setup, we can keep *18 months* of data.
> If you'd like to query further back than that, you can download dump files
> (which we'll make easier to use with python utilities).
>
> What do you think?  Will you need more than 18 months of data?  If so, we
> need to add more nodes when we get to that point, and that costs money, so
> we want to check if there is a real need for it.
>
> Another option is to start degrading the resolution for older data (only
> keep weekly or monthly for data older than 1 year for example).  If you
> need more than 18 months, we'd love to hear your use case and something in
> the form of:
>
> need daily resolution for 1 year
> need weekly resolution for 2 years
> need monthly resolution for 3 years
>
> Thank you!
>
> Dan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Jonathan Morgan
My use case: historical data beyond 18 months would be really useful for
teaching data science.

This spring, I had a bunch of college programming students using the
PageView API in combination with the standard MW API in their course
research projects. They tracked edits and views to particular pages over
time (example: Wikipedia articles about television shows like *Game of
Thrones* and *Silicon Valley*). Goal was to understand whether the release
of a new episode/season triggered an increase in edits to the Wikipedia
article, or just views.

In terms of granularity: article pageviews spike and fall rapidly in
response to external events. Reducing the granularity to weekly or monthly
would make the data less useful, because it averages out a lot of these
interesting dynamics.

Parsing the dumps is not a huge deal, but it involves several additional
steps and requires somewhat more expertise.

- Jonathan

On Fri, Jul 29, 2016 at 5:40 AM, Dan Andreescu 
wrote:

> Dear Pageview API consumers,
>
> We would like to plan storage capacity for our pageview API cluster.
> Right now, with a reliable RAID setup, we can keep *18 months* of data.
> If you'd like to query further back than that, you can download dump files
> (which we'll make easier to use with python utilities).
>
> What do you think?  Will you need more than 18 months of data?  If so, we
> need to add more nodes when we get to that point, and that costs money, so
> we want to check if there is a real need for it.
>
> Another option is to start degrading the resolution for older data (only
> keep weekly or monthly for data older than 1 year for example).  If you
> need more than 18 months, we'd love to hear your use case and something in
> the form of:
>
> need daily resolution for 1 year
> need weekly resolution for 2 years
> need monthly resolution for 3 years
>
> Thank you!
>
> Dan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) 
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Jan Ainali
Would that mean that API calls would break over time? Eg. if I had a call
for daily resolution over a specific time period 17 months back, it would
break in a month?

/Jan Ainali
(skickat på språng så ursäkta min fåordighet)

On Jul 29, 2016 15:41, "Dan Andreescu"  wrote:

> Dear Pageview API consumers,
>
> We would like to plan storage capacity for our pageview API cluster.
> Right now, with a reliable RAID setup, we can keep *18 months* of data.
> If you'd like to query further back than that, you can download dump files
> (which we'll make easier to use with python utilities).
>
> What do you think?  Will you need more than 18 months of data?  If so, we
> need to add more nodes when we get to that point, and that costs money, so
> we want to check if there is a real need for it.
>
> Another option is to start degrading the resolution for older data (only
> keep weekly or monthly for data older than 1 year for example).  If you
> need more than 18 months, we'd love to hear your use case and something in
> the form of:
>
> need daily resolution for 1 year
> need weekly resolution for 2 years
> need monthly resolution for 3 years
>
> Thank you!
>
> Dan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Toby Negrin
My personal use cases which are primarily using the visualization tools
would appreciate more dimensionality in daily and weekly views (which
increases storage). I think you should definitely degrade the resolution,
possibly more aggressively than you propose. RRDTool has been doing this
for decades so people are used to it.

-Toby

On Fri, Jul 29, 2016 at 5:40 AM, Dan Andreescu 
wrote:

> Dear Pageview API consumers,
>
> We would like to plan storage capacity for our pageview API cluster.
> Right now, with a reliable RAID setup, we can keep *18 months* of data.
> If you'd like to query further back than that, you can download dump files
> (which we'll make easier to use with python utilities).
>
> What do you think?  Will you need more than 18 months of data?  If so, we
> need to add more nodes when we get to that point, and that costs money, so
> we want to check if there is a real need for it.
>
> Another option is to start degrading the resolution for older data (only
> keep weekly or monthly for data older than 1 year for example).  If you
> need more than 18 months, we'd love to hear your use case and something in
> the form of:
>
> need daily resolution for 1 year
> need weekly resolution for 2 years
> need monthly resolution for 3 years
>
> Thank you!
>
> Dan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Alex Druk
Glad to be helpful!
It is interesting that, contrary to my predictions, number of visitors to
wikipediatrends.com and downloads did not drop when API became available. I
am wondering if stats.grok.se notice significant drop.
Alex

On Fri, Jul 29, 2016 at 6:08 PM, Dan Andreescu 
wrote:

> That's very useful, Alex, thanks!  I guess those requests would need to be
> covered by dumps anyway, since we only have data back to 2015.  I'll ping
> Henrik too.
>
> On Fri, Jul 29, 2016 at 12:06 PM, Alex Druk  wrote:
>
>> Maybe it make sense to ask Henrik (stats.grok.se) for his download
>> stats.
>> We at wikipediatrends.com usually receive < 5 request/month for full
>> data (from 2008).
>>
>> On Fri, Jul 29, 2016 at 2:40 PM, Dan Andreescu 
>> wrote:
>>
>>> Dear Pageview API consumers,
>>>
>>> We would like to plan storage capacity for our pageview API cluster.
>>> Right now, with a reliable RAID setup, we can keep *18 months* of
>>> data.  If you'd like to query further back than that, you can download dump
>>> files (which we'll make easier to use with python utilities).
>>>
>>> What do you think?  Will you need more than 18 months of data?  If so,
>>> we need to add more nodes when we get to that point, and that costs money,
>>> so we want to check if there is a real need for it.
>>>
>>> Another option is to start degrading the resolution for older data (only
>>> keep weekly or monthly for data older than 1 year for example).  If you
>>> need more than 18 months, we'd love to hear your use case and something in
>>> the form of:
>>>
>>> need daily resolution for 1 year
>>> need weekly resolution for 2 years
>>> need monthly resolution for 3 years
>>>
>>> Thank you!
>>>
>>> Dan
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>>
>> --
>> Thank you.
>>
>> Alex Druk
>> alex.d...@gmail.com
>> (775) 237-8550 Google voice
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Thank you.

Alex Druk
alex.d...@gmail.com
(775) 237-8550 Google voice
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Dan Andreescu
That's very useful, Alex, thanks!  I guess those requests would need to be
covered by dumps anyway, since we only have data back to 2015.  I'll ping
Henrik too.

On Fri, Jul 29, 2016 at 12:06 PM, Alex Druk  wrote:

> Maybe it make sense to ask Henrik (stats.grok.se) for his download stats.
> We at wikipediatrends.com usually receive < 5 request/month for full data
> (from 2008).
>
> On Fri, Jul 29, 2016 at 2:40 PM, Dan Andreescu 
> wrote:
>
>> Dear Pageview API consumers,
>>
>> We would like to plan storage capacity for our pageview API cluster.
>> Right now, with a reliable RAID setup, we can keep *18 months* of data.
>> If you'd like to query further back than that, you can download dump files
>> (which we'll make easier to use with python utilities).
>>
>> What do you think?  Will you need more than 18 months of data?  If so, we
>> need to add more nodes when we get to that point, and that costs money, so
>> we want to check if there is a real need for it.
>>
>> Another option is to start degrading the resolution for older data (only
>> keep weekly or monthly for data older than 1 year for example).  If you
>> need more than 18 months, we'd love to hear your use case and something in
>> the form of:
>>
>> need daily resolution for 1 year
>> need weekly resolution for 2 years
>> need monthly resolution for 3 years
>>
>> Thank you!
>>
>> Dan
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Thank you.
>
> Alex Druk
> alex.d...@gmail.com
> (775) 237-8550 Google voice
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Pageview API] Data Retention Question

2016-07-29 Thread Alex Druk
Maybe it make sense to ask Henrik (stats.grok.se) for his download stats.
We at wikipediatrends.com usually receive < 5 request/month for full data
(from 2008).

On Fri, Jul 29, 2016 at 2:40 PM, Dan Andreescu 
wrote:

> Dear Pageview API consumers,
>
> We would like to plan storage capacity for our pageview API cluster.
> Right now, with a reliable RAID setup, we can keep *18 months* of data.
> If you'd like to query further back than that, you can download dump files
> (which we'll make easier to use with python utilities).
>
> What do you think?  Will you need more than 18 months of data?  If so, we
> need to add more nodes when we get to that point, and that costs money, so
> we want to check if there is a real need for it.
>
> Another option is to start degrading the resolution for older data (only
> keep weekly or monthly for data older than 1 year for example).  If you
> need more than 18 months, we'd love to hear your use case and something in
> the form of:
>
> need daily resolution for 1 year
> need weekly resolution for 2 years
> need monthly resolution for 3 years
>
> Thank you!
>
> Dan
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Thank you.

Alex Druk
alex.d...@gmail.com
(775) 237-8550 Google voice
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-23 Thread Pine W
:) I owe him a barnstar.

Pine

On Thu, Jun 23, 2016 at 12:38 AM, Ryan Kaldari 
wrote:

> Musikanimal fixed it.
>
>
> On Jun 23, 2016, at 1:09 AM, Pine W  wrote:
>
> Thanks Toby.
>
> Pine
> On Jun 22, 2016 16:02, "Toby Negrin"  wrote:
>
>> ok -- I'm getting a spinning wheel of doom where the graphs used to be. I
>> suspect there's something amiss with the underlying service.
>>
>> https://phabricator.wikimedia.org/T138448
>>
>> -Toby
>>
>> On Wed, Jun 22, 2016 at 5:56 PM, Pine W  wrote:
>>
>>> Hi Erik,
>>>
>>> Examples:
>>>
>>>
>>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album
>>>
>>>
>>> https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page
>>>
>>> Toby, thanks for the suggestion, but I tried multiple browsers with no
>>> ad blockers. The graphs still don't display.
>>>
>>> Pine
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-23 Thread Ryan Kaldari
Musikanimal fixed it.


> On Jun 23, 2016, at 1:09 AM, Pine W  wrote:
> 
> Thanks Toby.
> 
> Pine
> 
>> On Jun 22, 2016 16:02, "Toby Negrin"  wrote:
>> ok -- I'm getting a spinning wheel of doom where the graphs used to be. I 
>> suspect there's something amiss with the underlying service.
>> 
>> https://phabricator.wikimedia.org/T138448
>> 
>> -Toby
>> 
>>> On Wed, Jun 22, 2016 at 5:56 PM, Pine W  wrote:
>>> Hi Erik,
>>> 
>>> Examples:
>>> 
>>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album
>>> 
>>> https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page
>>> 
>>> Toby, thanks for the suggestion, but I tried multiple browsers with no ad 
>>> blockers. The graphs still don't display.
>>> 
>>> Pine
>>> 
>>> 
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> 
>> 
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-22 Thread Pine W
Thanks Toby.

Pine
On Jun 22, 2016 16:02, "Toby Negrin"  wrote:

> ok -- I'm getting a spinning wheel of doom where the graphs used to be. I
> suspect there's something amiss with the underlying service.
>
> https://phabricator.wikimedia.org/T138448
>
> -Toby
>
> On Wed, Jun 22, 2016 at 5:56 PM, Pine W  wrote:
>
>> Hi Erik,
>>
>> Examples:
>>
>>
>> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album
>>
>>
>> https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page
>>
>> Toby, thanks for the suggestion, but I tried multiple browsers with no ad
>> blockers. The graphs still don't display.
>>
>> Pine
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-22 Thread Pine W
Hi Erik,

Examples:

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org=all-access=user=latest-20=Album

https://tools.wmflabs.org/pageviews/?project=meta.wikimedia.org=all-access=user=latest-20=Main_Page

Toby, thanks for the suggestion, but I tried multiple browsers with no ad
blockers. The graphs still don't display.

Pine
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview analysis graphs not loading

2016-06-22 Thread Toby Negrin
Turn off your ad-blocker and it should work. At least this solved my issues.

-Toby

On Wed, Jun 22, 2016 at 4:52 PM, Pine W  wrote:

> Hi folks,
>
> I can't get pageview analysis graphs to load on 2 wikis that I've tested,
> and I've tried desktop and mobile on multiple browsers. Can someone take a
> look at what might need fixing?
>
> Thanks!
>
> Pine
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-03 Thread Jan Ainali
A brief update on the students work. They have just finished a Project
Planning Document (PPD), which is a formal part of their course (20 pages
long). This week they are working on mockups, which hopefully will be ready
on Friday and shared for feedback.

Med vänliga hälsningar
Jan Ainali
http://ainali.com

2016-02-03 8:57 GMT+01:00 Pine W :

> https://phabricator.wikimedia.org/T120497. It's great to see the amount
> of interest in this!
>
> Pine
>
> On Tue, Feb 2, 2016 at 11:43 PM, Quim Gil  wrote:
>
>>
>>
>> On Tue, Feb 2, 2016 at 11:14 PM, Dan Andreescu 
>> wrote:
>>>
>>> Yes there is, a group of students from Sweden are working on the first
>>> attempt.
>>>
>>
>> Is there a URL to learn more (i.e. a Phabricator task)? This is
>> interesting news, and we might want to advertize.
>>
>> --
>> Quim Gil
>> Engineering Community Manager @ Wikimedia Foundation
>> http://www.mediawiki.org/wiki/User:Qgil
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-03 Thread Pine W
Thanks for the update! Would it be possible for the students to publish
their PPD on Commons under a CC-BY or CC-BY-SA license?

Looking forward to seeing the mockups,

Pine

On Wed, Feb 3, 2016 at 12:54 AM, Jan Ainali  wrote:

> A brief update on the students work. They have just finished a Project
> Planning Document (PPD), which is a formal part of their course (20 pages
> long). This week they are working on mockups, which hopefully will be ready
> on Friday and shared for feedback.
>
> Med vänliga hälsningar
> Jan Ainali
> http://ainali.com
>
> 2016-02-03 8:57 GMT+01:00 Pine W :
>
>> https://phabricator.wikimedia.org/T120497. It's great to see the amount
>> of interest in this!
>>
>> Pine
>>
>> On Tue, Feb 2, 2016 at 11:43 PM, Quim Gil  wrote:
>>
>>>
>>>
>>> On Tue, Feb 2, 2016 at 11:14 PM, Dan Andreescu >> > wrote:

 Yes there is, a group of students from Sweden are working on the first
 attempt.

>>>
>>> Is there a URL to learn more (i.e. a Phabricator task)? This is
>>> interesting news, and we might want to advertize.
>>>
>>> --
>>> Quim Gil
>>> Engineering Community Manager @ Wikimedia Foundation
>>> http://www.mediawiki.org/wiki/User:Qgil
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-02 Thread Dan Andreescu
  2. Is there a replacement for stats.grok.se planned or already available? A reliable substitute would be great, and it would be nice if we could either replace the existing on-wiki "page view statistics" link or add a supplemental link to the new resource.Apologizes if this information was already published and I missed it.Yes there is, a group of students from Sweden are working on the first attempt. If that doesn't succeed, others including our team have plans to help.


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-02 Thread Pine W
https://phabricator.wikimedia.org/T120497. It's great to see the amount of
interest in this!

Pine

On Tue, Feb 2, 2016 at 11:43 PM, Quim Gil  wrote:

>
>
> On Tue, Feb 2, 2016 at 11:14 PM, Dan Andreescu 
> wrote:
>>
>> Yes there is, a group of students from Sweden are working on the first
>> attempt.
>>
>
> Is there a URL to learn more (i.e. a Phabricator task)? This is
> interesting news, and we might want to advertize.
>
> --
> Quim Gil
> Engineering Community Manager @ Wikimedia Foundation
> http://www.mediawiki.org/wiki/User:Qgil
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-02 Thread Quim Gil
On Tue, Feb 2, 2016 at 11:14 PM, Dan Andreescu 
wrote:
>
> Yes there is, a group of students from Sweden are working on the first
> attempt.
>

Is there a URL to learn more (i.e. a Phabricator task)? This is interesting
news, and we might want to advertize.

-- 
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-01 Thread Pine W
Cool, thank you Nemo.

Pine

On Sun, Jan 31, 2016 at 12:50 AM, Federico Leva (Nemo) 
wrote:

> Pine W, 31/01/2016 09:07:
>
>> Apologizes if this information was already published and I missed it.
>>
>
> https://phabricator.wikimedia.org/T120497
> https://phabricator.wikimedia.org/T43327
>
> Nemo
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-02-01 Thread Madhumitha Viswanathan
On Sun, Jan 31, 2016 at 12:07 AM, Pine W  wrote:

> Hi Analytics folks,
>
> My understanding is that the new pageview definition, which excludes
> automata to a certain extent, is now published. I have a few questions:
>
> 1. Is stats.grok.se already transitioned to the new definition, or will
> it?
>
Not currently. The closest available thing that uses the new definition is
the pageview api demo ,
but since this is a demo we are not actively adding features or fixing bugs
in the demo. The proposed replacement tool will serve that purpose in the
future.

>
> 2. Is there a replacement for stats.grok.se planned or already available?
> A reliable substitute would be great, and it would be nice if we could
> either replace the existing on-wiki "page view statistics" link or add a
> supplemental link to the new resource.
>
> Apologizes if this information was already published and I missed it.
>
> Thanks,
>
> Pine
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
--Madhu :)
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tools

2016-01-31 Thread Federico Leva (Nemo)

Pine W, 31/01/2016 09:07:

Apologizes if this information was already published and I missed it.


https://phabricator.wikimedia.org/T120497
https://phabricator.wikimedia.org/T43327

Nemo

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageview API discrepancy

2016-01-25 Thread James Salsman
> Well, the first version looks at December to January, and the second
> at November to December, so it looks like an implementation error.

No, sorry, it was my mistake somehow. I must have reset the calendars
back a month. Sorry for the false alarm.

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageview API discrepancy

2016-01-25 Thread James Salsman
> do you mean you screenshotted it at a different date?

Yes, January ‎23. The URL is identical.

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pageview API discrepancy

2016-01-25 Thread Oliver Keyes
Having no idea where the first image is sourced...do you mean you
screenshotted it at a different date? What's the actual delta in
methods here?

On 25 January 2016 at 10:25, James Salsman  wrote:
> Why is there such a difference since January 10 on
> http://i.imgur.com/rA1yUaH.png
> compared to 
> https://analytics.wmflabs.org/demo/pageview-api/#articles=Hillary_Clinton,Bernie_Sanders=2015-11-01=2015-12-22=enwiki
> ?
>
> Given the corresponding uptick at
> http://traffic.alexa.com/graph?u=http%3A%2F%2Fberniesanders.com=http%3A%2F%2Fhillaryclinton.com=http%3A%2F%2Fdonaldjtrump.com=1=400=220=n=3m=e6f3fc
> I am inclined to believe that the earlier version is correct.
>
> Has the data been adjusted?
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tool

2016-01-15 Thread Nuria Ruiz
Trying again, adding analytics@ (public e-mail list)

On Fri, Jan 15, 2016 at 5:22 AM, Marcel Ruiz Forns 
wrote:

> I also think we should start with exposing the 3 api's endpoints in a GUI,
> which - as Dan says - we know respond to community interests. And then ask
> the community for more input, that could mean improvements to the tool, new
> endpoints or completely new ideas...
>
> On Thu, Jan 14, 2016 at 10:45 PM, Dan Andreescu 
> wrote:
>
>> I'm ok if people want to take an iterative approach, I just want to point
>> out that the usage information is not very indicative of value at this
>> point.  The API is not widely used and the per-article endpoint is expected
>> to be hit much much more than per-project or top simply because the queries
>> are many orders of magnitude more granular.  So we can't really judge
>> importance from that comparison.
>>
>> On Thu, Jan 14, 2016 at 4:43 PM, Leila Zia  wrote:
>>
>>>
>>> On Thu, Jan 14, 2016 at 1:09 PM, Dan Andreescu >> > wrote:
>>>
 My question is: How are we going to define the requirements for the
> tool? I was planning to get some community input on defining which stats
> would help contributors the most. What do you think?
>

 My opinion here is that we should just expose everything the pageview
 API is capable of.  It's only 3 different end points and they were chosen
 based on what the community found useful.  As we add more endpoints we can
 keep checking if visualization is important.  But of course if others have
 other more specific plans, we can wait for those tools to be built and
 iterate.

>>>
>>> ​Building up on Dan's suggestion: I'd go with communicating and/or
>>> discussing the following with the community:
>>>
>>> * the 3 different metrics we can offer a UI for
>>> * what other metrics they find useful for their work. This will help us
>>> collect information about what other kind of metrics we should offer as an
>>> end-point if we decide to add to the end-points (pageview per article
>>> by country has come up many times, for example)
>>> * whether they consider the wish as satisfied if we offer a UI for the
>>> 3 different metrics, and perhaps over time add more metrics to the UI
>>> as they become available (not necessarily in 2016).
>>>
>>> Leila
>>>
>>>
>>>
>>
>
>
> --
> *Marcel Ruiz Forns*
> Analytics Developer
> Wikimedia Foundation
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tool

2016-01-15 Thread Alex Druk
Hi all,

My two cents to discussion about endpoints to pageview API:
1) stats for categories that include all subcats and all pages,
2) include redirects to article counts

All the best,

On Fri, Jan 15, 2016 at 6:05 PM, Nuria Ruiz  wrote:

> Trying again, adding analytics@ (public e-mail list)
>
> On Fri, Jan 15, 2016 at 5:22 AM, Marcel Ruiz Forns 
> wrote:
>
>> I also think we should start with exposing the 3 api's endpoints in a
>> GUI, which - as Dan says - we know respond to community interests. And then
>> ask the community for more input, that could mean improvements to the tool,
>> new endpoints or completely new ideas...
>>
>> On Thu, Jan 14, 2016 at 10:45 PM, Dan Andreescu > > wrote:
>>
>>> I'm ok if people want to take an iterative approach, I just want to
>>> point out that the usage information is not very indicative of value at
>>> this point.  The API is not widely used and the per-article endpoint is
>>> expected to be hit much much more than per-project or top simply because
>>> the queries are many orders of magnitude more granular.  So we can't really
>>> judge importance from that comparison.
>>>
>>> On Thu, Jan 14, 2016 at 4:43 PM, Leila Zia  wrote:
>>>

 On Thu, Jan 14, 2016 at 1:09 PM, Dan Andreescu  wrote:

> My question is: How are we going to define the requirements for the
>> tool? I was planning to get some community input on defining which stats
>> would help contributors the most. What do you think?
>>
>
> My opinion here is that we should just expose everything the pageview
> API is capable of.  It's only 3 different end points and they were chosen
> based on what the community found useful.  As we add more endpoints we can
> keep checking if visualization is important.  But of course if others have
> other more specific plans, we can wait for those tools to be built and
> iterate.
>

 ​Building up on Dan's suggestion: I'd go with communicating and/or
 discussing the following with the community:

 * the 3 different metrics we can offer a UI for
 * what other metrics they find useful for their work. This will help us
 collect information about what other kind of metrics we should offer as an
 end-point if we decide to add to the end-points (pageview per article
 by country has come up many times, for example)
 * whether they consider the wish as satisfied if we offer a UI for the
 3 different metrics, and perhaps over time add more metrics to the UI
 as they become available (not necessarily in 2016).

 Leila



>>>
>>
>>
>> --
>> *Marcel Ruiz Forns*
>> Analytics Developer
>> Wikimedia Foundation
>>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Thank you.

Alex Druk, PhD
wikipediatrends.com
alex.d...@gmail.com
(775) 237-8550 Google voice
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview stats tool

2016-01-15 Thread Felix J. Scholz
I think it would be interesting to be able to search for articles by views
while retaining the existing qualifiers.

An example query might be: List articles with 500 - 750 views during the
time of 12/15/2015 and 12/17/2015 (or maybe, if that is easier, just one
day), only real users (no bots / spiders) accessing from mobile devices.

The data is already there, it is just a different way of accessing it.

While it is currently already possible to do this, if one wants to crawl a
whole project, you need millions of API requests (at least for the bigger
wikipedias like en or de and whatnot).

Best,
Felix

This
email has been sent from a virus-free computer protected by Avast.
www.avast.com

<#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Fri, Jan 15, 2016 at 2:31 PM, Oliver Keyes  wrote:

> Those sound like relatively advanced features a bit beyond the initial
> offering, but like useful things to provide in the long-term, yeah.
> I'm not sure what the status of the redirects inclusion (which is sort
> of a question about the underlying data source rather than the
> endpoint) is.
>
> On 15 January 2016 at 11:28, Alex Druk  wrote:
> > Hi all,
> >
> > My two cents to discussion about endpoints to pageview API:
> > 1) stats for categories that include all subcats and all pages,
> > 2) include redirects to article counts
> >
> > All the best,
> >
> > On Fri, Jan 15, 2016 at 6:05 PM, Nuria Ruiz  wrote:
> >>
> >> Trying again, adding analytics@ (public e-mail list)
> >>
> >> On Fri, Jan 15, 2016 at 5:22 AM, Marcel Ruiz Forns <
> mfo...@wikimedia.org>
> >> wrote:
> >>>
> >>> I also think we should start with exposing the 3 api's endpoints in a
> >>> GUI, which - as Dan says - we know respond to community interests. And
> then
> >>> ask the community for more input, that could mean improvements to the
> tool,
> >>> new endpoints or completely new ideas...
> >>>
> >>> On Thu, Jan 14, 2016 at 10:45 PM, Dan Andreescu
> >>>  wrote:
> 
>  I'm ok if people want to take an iterative approach, I just want to
>  point out that the usage information is not very indicative of value
> at this
>  point.  The API is not widely used and the per-article endpoint is
> expected
>  to be hit much much more than per-project or top simply because the
> queries
>  are many orders of magnitude more granular.  So we can't really judge
>  importance from that comparison.
> 
>  On Thu, Jan 14, 2016 at 4:43 PM, Leila Zia 
> wrote:
> >
> >
> > On Thu, Jan 14, 2016 at 1:09 PM, Dan Andreescu
> >  wrote:
> >>>
> >>> My question is: How are we going to define the requirements for the
> >>> tool? I was planning to get some community input on defining which
> stats
> >>> would help contributors the most. What do you think?
> >>
> >>
> >> My opinion here is that we should just expose everything the
> pageview
> >> API is capable of.  It's only 3 different end points and they were
> chosen
> >> based on what the community found useful.  As we add more endpoints
> we can
> >> keep checking if visualization is important.  But of course if
> others have
> >> other more specific plans, we can wait for those tools to be built
> and
> >> iterate.
> >
> >
> > Building up on Dan's suggestion: I'd go with communicating and/or
> > discussing the following with the community:
> >
> > * the 3 different metrics we can offer a UI for
> > * what other metrics they find useful for their work. This will help
> us
> > collect information about what other kind of metrics we should offer
> as an
> > end-point if we decide to add to the end-points (pageview per
> article by
> > country has come up many times, for example)
> > * whether they consider the wish as satisfied if we offer a UI for
> the
> > 3 different metrics, and perhaps over time add more metrics to the
> UI as
> > they become available (not necessarily in 2016).
> >
> > Leila
> >
> >
> 
> >>>
> >>>
> >>>
> >>> --
> >>> Marcel Ruiz Forns
> >>> Analytics Developer
> >>> Wikimedia Foundation
> >>
> >>
> >>
> >> ___
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> >
> > --
> > Thank you.
> >
> > Alex Druk, PhD
> > wikipediatrends.com
> > alex.d...@gmail.com
> > (775) 237-8550 Google voice
> >
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes

Re: [Analytics] Pageview API

2015-11-18 Thread Dan Andreescu
>
> Finally! I waited so many years for a formal tool like that. Thank you!
>

Itzik, I remember your requests for this type of data from a long long time
ago, when I was just starting at the foundation!!  You and the many others
with similar requests are the people we were thanking in the announcement :)

And demo on wmflabs is great, but it will be great to add option to export
> the data to CSV file. Also, the data are only from the begging of October,
> any chance we can load past data as well?
>

So, I agree with what Madhu said that you could file a Phabricator ticket
for this.  But for now, we're not looking to build a UI for it that is
production ready.  The demo was meant just to show that it's very simple to
do so.  It took one of our engineers only a few days and under 300 lines of
code to get this done.  We'd like to be patient and see if the community at
large has an interest in running something like stats.grok.se now that
heavy lifting of data is no longer required, and performance is guaranteed
by us within reasonable expectations.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-17 Thread Antoine Musso
Le 16/11/2015 22:50, Dan Andreescu a écrit :
> Dear Data Enthusiasts,
> 
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API
> .
>  


Hello,

That is great. One can probably craft a MediaWiki extension to have the
metric shown on the Special:Statistics

And lets be crazy, even add a statistic link in the sidebar for each pages!

https://en.wikipedia.org/wiki/Special:Statistics


-- 
Antoine "hashar" Musso


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-17 Thread Magnus Manske
Congratulations from me as well, especially since I was probably the one
screaming for it the loudest (or the longest, or both...)

I'll now go and have good long lok at which tools to adapt, which to
rewrite, and which to invent!


On Mon, Nov 16, 2015 at 9:50 PM Dan Andreescu 
wrote:

> Dear Data Enthusiasts,
>
>
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API
> .
> For an example of what kind of UIs someone could build with it, check out
> this excellent demo 
> (code)
> 
> .
>
>
> The API can tell you how many times a wiki article or project is viewed
> over a certain period.  You can break that down by views from web crawlers
> or humans, and by desktop, mobile site, or mobile app.  And you can find
> the 1000 most viewed articles
> 
> on any project, on any given day or month that we have data for.  We
> currently have data back through October and we will be able to go back to
> May 2015 when the loading jobs are all done.  For more information, take a
> look at the user docs
> .
>
>
> After many requests from the community, we were really happy to finally
> make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
> Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
> maintaining stats.grok, and, of course, the many community members who have
> been so patient with us all this time.
>
>
> The Research team’s Article Recommender tool
>  already uses the API to rank pages and
> determine relative importance.  Wiki Education Foundation’s dashboard
>  is going to be using it to count how
> many times an article has been viewed since a student edited it.  And there
> are other grand plans for this data like “article finder”, which will find
> low-rated articles with a lot of pageviews; this can be used by editors
> looking for high-impact work.  Join the fun, we’re happy to help get you
> started and listen to your ideas.  Also, if you find bugs or want to
> suggest improvements, please create a task in Phabricator and tag it with
> #Analytics-Backlog
> .
>
>
> So what’s next?  We can think of too many directions to go into, for
> pageview data and Wikimedia project data, in general.  We need to work with
> you to make a great plan for the next few quarters.  Please chime in here
>  with your needs.
>
>
> Team Analytics
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-17 Thread Dan Andreescu
>
> That is great. One can probably craft a MediaWiki extension to have the
> metric shown on the Special:Statistics
>
> And lets be crazy, even add a statistic link in the sidebar for each pages!
>
> https://en.wikipedia.org/wiki/Special:Statistics


The actual mediawiki work for this only really involves reading JSON and
throwing bugs at us if anything breaks :)  So, should be easy, this was one
of hour hopes as well, let's get to it!

By the way, to even better support this (with more performance / easier
discovery), we plan on providing access to this api from the wiki-specific
RESTBase URLs (eg. https://en.wikipedia.org/api/rest_v1/?doc).  That hasn't
been deployed yet because we're discussing where the best place for it is
there (opinions welcome).
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-17 Thread Itzik - Wikimedia Israel
Finally! I waited so many years for a formal tool like that. Thank you!

And demo on wmflabs is great, but it will be great to add option to export
the data to CSV file. Also, the data are only from the begging of October,
any chance we can load past data as well?


Itzik



*Regards,Itzik Edri*
Chairperson, Wikimedia Israel
+972-(0)-54-5878078 | http://www.wikimedia.org.il
Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment!


On Mon, Nov 16, 2015 at 11:50 PM, Dan Andreescu 
wrote:

> Dear Data Enthusiasts,
>
>
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API
> .
> For an example of what kind of UIs someone could build with it, check out
> this excellent demo 
> (code)
> 
> .
>
>
> The API can tell you how many times a wiki article or project is viewed
> over a certain period.  You can break that down by views from web crawlers
> or humans, and by desktop, mobile site, or mobile app.  And you can find
> the 1000 most viewed articles
> 
> on any project, on any given day or month that we have data for.  We
> currently have data back through October and we will be able to go back to
> May 2015 when the loading jobs are all done.  For more information, take a
> look at the user docs
> .
>
>
> After many requests from the community, we were really happy to finally
> make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
> Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
> maintaining stats.grok, and, of course, the many community members who have
> been so patient with us all this time.
>
>
> The Research team’s Article Recommender tool
>  already uses the API to rank pages and
> determine relative importance.  Wiki Education Foundation’s dashboard
>  is going to be using it to count how
> many times an article has been viewed since a student edited it.  And there
> are other grand plans for this data like “article finder”, which will find
> low-rated articles with a lot of pageviews; this can be used by editors
> looking for high-impact work.  Join the fun, we’re happy to help get you
> started and listen to your ideas.  Also, if you find bugs or want to
> suggest improvements, please create a task in Phabricator and tag it with
> #Analytics-Backlog
> .
>
>
> So what’s next?  We can think of too many directions to go into, for
> pageview data and Wikimedia project data, in general.  We need to work with
> you to make a great plan for the next few quarters.  Please chime in here
>  with your needs.
>
>
> Team Analytics
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-17 Thread Madhumitha Viswanathan
Hi Itzik,

Glad you like the API! Feel free to add a ticket to phabrictor tagging
Analytics-Backlog on your feature request to export to csv, and we can
discuss it there. The data will date back to May 2015, and we are in the
process of loading it all into Cassandra, it should all be available
RealSoon™ :)

Best,

On Tue, Nov 17, 2015 at 5:25 PM, Itzik - Wikimedia Israel <
it...@wikimedia.org.il> wrote:

> Finally! I waited so many years for a formal tool like that. Thank you!
>
> And demo on wmflabs is great, but it will be great to add option to export
> the data to CSV file. Also, the data are only from the begging of October,
> any chance we can load past data as well?
>
>
> Itzik
>
>
>
> *Regards,Itzik Edri*
> Chairperson, Wikimedia Israel
> +972-(0)-54-5878078 | http://www.wikimedia.org.il
> Imagine a world in which every single human being can freely share in the
> sum of all knowledge. That's our commitment!
>
>
> On Mon, Nov 16, 2015 at 11:50 PM, Dan Andreescu 
> wrote:
>
>> Dear Data Enthusiasts,
>>
>>
>> In collaboration with the Services team, the analytics team wishes to
>> announce a public Pageview API
>> .
>> For an example of what kind of UIs someone could build with it, check out
>> this excellent demo 
>> (code)
>> 
>> .
>>
>>
>> The API can tell you how many times a wiki article or project is viewed
>> over a certain period.  You can break that down by views from web crawlers
>> or humans, and by desktop, mobile site, or mobile app.  And you can find
>> the 1000 most viewed articles
>> 
>> on any project, on any given day or month that we have data for.  We
>> currently have data back through October and we will be able to go back to
>> May 2015 when the loading jobs are all done.  For more information, take a
>> look at the user docs
>> .
>>
>>
>> After many requests from the community, we were really happy to finally
>> make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
>> Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
>> maintaining stats.grok, and, of course, the many community members who have
>> been so patient with us all this time.
>>
>>
>> The Research team’s Article Recommender tool
>>  already uses the API to rank pages and
>> determine relative importance.  Wiki Education Foundation’s dashboard
>>  is going to be using it to count how
>> many times an article has been viewed since a student edited it.  And there
>> are other grand plans for this data like “article finder”, which will find
>> low-rated articles with a lot of pageviews; this can be used by editors
>> looking for high-impact work.  Join the fun, we’re happy to help get you
>> started and listen to your ideas.  Also, if you find bugs or want to
>> suggest improvements, please create a task in Phabricator and tag it with
>> #Analytics-Backlog
>> .
>>
>>
>> So what’s next?  We can think of too many directions to go into, for
>> pageview data and Wikimedia project data, in general.  We need to work with
>> you to make a great plan for the next few quarters.  Please chime in here
>>  with your needs.
>>
>>
>> Team Analytics
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>

-- Madhumitha
Software Engineer, Analytics
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-16 Thread James Forrester
On 16 November 2015 at 13:50, Dan Andreescu 
wrote:

> Dear Data Enthusiasts,
>
>
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API
> .
> For an example of what kind of UIs someone could build with it, check out
> this excellent demo 
> (code)
> 
> .
>

​This is great to see. Thank you everyone for your work on this.​

​Yours,
-- 
James D. Forrester
Lead Product Manager, Editing
Wikimedia Foundation, Inc.

jforres...@wikimedia.org | @jdforrester
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API

2015-11-16 Thread Rachel diCerbo
This is fantastic news - thank you so much, Analytics, Services, Ops, and
the communities who supported/requested this!

On Mon, Nov 16, 2015 at 1:50 PM, Dan Andreescu 
wrote:

> Dear Data Enthusiasts,
>
>
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API
> .
> For an example of what kind of UIs someone could build with it, check out
> this excellent demo 
> (code)
> 
> .
>
>
> The API can tell you how many times a wiki article or project is viewed
> over a certain period.  You can break that down by views from web crawlers
> or humans, and by desktop, mobile site, or mobile app.  And you can find
> the 1000 most viewed articles
> 
> on any project, on any given day or month that we have data for.  We
> currently have data back through October and we will be able to go back to
> May 2015 when the loading jobs are all done.  For more information, take a
> look at the user docs
> .
>
>
> After many requests from the community, we were really happy to finally
> make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
> Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
> maintaining stats.grok, and, of course, the many community members who have
> been so patient with us all this time.
>
>
> The Research team’s Article Recommender tool
>  already uses the API to rank pages and
> determine relative importance.  Wiki Education Foundation’s dashboard
>  is going to be using it to count how
> many times an article has been viewed since a student edited it.  And there
> are other grand plans for this data like “article finder”, which will find
> low-rated articles with a lot of pageviews; this can be used by editors
> looking for high-impact work.  Join the fun, we’re happy to help get you
> started and listen to your ideas.  Also, if you find bugs or want to
> suggest improvements, please create a task in Phabricator and tag it with
> #Analytics-Backlog
> .
>
>
> So what’s next?  We can think of too many directions to go into, for
> pageview data and Wikimedia project data, in general.  We need to work with
> you to make a great plan for the next few quarters.  Please chime in here
>  with your needs.
>
>
> Team Analytics
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 

Rachel diCerbo
Director of Community Engagement (Product)
Wikimedia Foundation
Rdicerb (WMF) 
@a_rachel 
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-25 Thread Dan Andreescu
Two quick updates:

What Oliver said resonates with us, we are doing everything possible to
focus and keep the project moving instead of satisfying all possible
requirements at launch.

We have been working our goals (not yet finalized) to include Pageview API
by September.  There is quite a bit of puppetizing and productionizing to
do, but we are removing more and more distractions so I personally feel
optimistic.


Also, Jon K and other internal folks, the intermediate aggregate is
available for you to query and it's updated hourly.

On Mon, Jun 15, 2015 at 9:40 AM, Oliver Keyes oke...@wikimedia.org wrote:

 Well, that's not the case; https=1 was added for the apps, and so hit
 both mobile and text varnishes. Since all pageviews go through the
 text or mobile sources, all pageviews note (implicitly or explicitly)
 their https status.

 In regards to the initial suggestion; no. Don't add new fields. Don't
 propose new fields. Don't do anything with new fields - freeze the
 definition already. We've had a pageviews definition for 6 months,
 we've had unreliability in Henrik's third-party service for 12, and
 every time there's a should we add a new field? proposal it slows
 implementing an alternative down. We need that alternative.

 On 15 June 2015 at 00:57, Yuri Astrakhan yastrak...@wikimedia.org wrote:
  X-analytics contains HTTPS=1 for all text hits, but not for other types
 of
  traffic
 
  On Jun 15, 2015 00:54, Kevin Leduc ke...@wikimedia.org wrote:
 
  In light of the recent switch to use HTTPS, what about adding http/https
  information.  Maybe it can be added to the 'access_method' rather than
  adding a new dimension?
 
 
  On Thu, Jun 11, 2015 at 1:46 PM, Jon Katz jk...@wikimedia.org wrote:
 
  Hi Dan,
  Sorry for the late response to this--
 
  * Make a new cube that examines site versions and client information
  * Just use the private data as we're already doing, but aggregate it
  hourly or daily as needed, to make analysis much faster.
 
  How can I help add/keep this to/on your roadmap?
  -J
 
  On Fri, Jun 5, 2015 at 12:28 PM, Dan Andreescu 
 dandree...@wikimedia.org
  wrote:
 
  On Fri, Jun 5, 2015 at 3:09 PM, Oliver Keyes oke...@wikimedia.org
  wrote:
 
  If we can't share it with the public then it seems like it shouldn't
  be part of a proposal for an API.
 
 
  Right, to clarify, this proposal is for a public data set and API.
 
 
   Thanks Dan, and apologies if these are naive questions:
  
   For mobile web can we also see beta v. stable?  This is important
   for
   tracking prototypes, which is one of the core product uses for
 this
   data.
  
   For apps can we see ios v android?
 
 
  Jon, we chose to not include that information in order to limit the
  amount of data that we'd have to deal with.  If it gets too large, it
 won't
  fit into PostgreSQL.  For the iOS / Android and beta / alpha versions
 of the
  site we can either:
 
  * Make a new cube that examines site versions and client information
  * Just use the private data as we're already doing, but aggregate it
  hourly or daily as needed, to make analysis much faster.
 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 
 
 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 
 
 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 
 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-15 Thread Oliver Keyes
Well, that's not the case; https=1 was added for the apps, and so hit
both mobile and text varnishes. Since all pageviews go through the
text or mobile sources, all pageviews note (implicitly or explicitly)
their https status.

In regards to the initial suggestion; no. Don't add new fields. Don't
propose new fields. Don't do anything with new fields - freeze the
definition already. We've had a pageviews definition for 6 months,
we've had unreliability in Henrik's third-party service for 12, and
every time there's a should we add a new field? proposal it slows
implementing an alternative down. We need that alternative.

On 15 June 2015 at 00:57, Yuri Astrakhan yastrak...@wikimedia.org wrote:
 X-analytics contains HTTPS=1 for all text hits, but not for other types of
 traffic

 On Jun 15, 2015 00:54, Kevin Leduc ke...@wikimedia.org wrote:

 In light of the recent switch to use HTTPS, what about adding http/https
 information.  Maybe it can be added to the 'access_method' rather than
 adding a new dimension?


 On Thu, Jun 11, 2015 at 1:46 PM, Jon Katz jk...@wikimedia.org wrote:

 Hi Dan,
 Sorry for the late response to this--

 * Make a new cube that examines site versions and client information
 * Just use the private data as we're already doing, but aggregate it
 hourly or daily as needed, to make analysis much faster.

 How can I help add/keep this to/on your roadmap?
 -J

 On Fri, Jun 5, 2015 at 12:28 PM, Dan Andreescu dandree...@wikimedia.org
 wrote:

 On Fri, Jun 5, 2015 at 3:09 PM, Oliver Keyes oke...@wikimedia.org
 wrote:

 If we can't share it with the public then it seems like it shouldn't
 be part of a proposal for an API.


 Right, to clarify, this proposal is for a public data set and API.


  Thanks Dan, and apologies if these are naive questions:
 
  For mobile web can we also see beta v. stable?  This is important
  for
  tracking prototypes, which is one of the core product uses for this
  data.
 
  For apps can we see ios v android?


 Jon, we chose to not include that information in order to limit the
 amount of data that we'd have to deal with.  If it gets too large, it won't
 fit into PostgreSQL.  For the iOS / Android and beta / alpha versions of 
 the
 site we can either:

 * Make a new cube that examines site versions and client information
 * Just use the private data as we're already doing, but aggregate it
 hourly or daily as needed, to make analysis much faster.

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-14 Thread Kevin Leduc
In light of the recent switch to use HTTPS, what about adding http/https
information.  Maybe it can be added to the 'access_method' rather than
adding a new dimension?


On Thu, Jun 11, 2015 at 1:46 PM, Jon Katz jk...@wikimedia.org wrote:

 Hi Dan,
 Sorry for the late response to this--

 ** Make a new cube that examines site versions and client information*
 ** Just use the private data as we're already doing, but aggregate it
 hourly or daily as needed, to make analysis much faster.*

 How can I help add/keep this to/on your roadmap?
 -J

 On Fri, Jun 5, 2015 at 12:28 PM, Dan Andreescu dandree...@wikimedia.org
 wrote:

 On Fri, Jun 5, 2015 at 3:09 PM, Oliver Keyes oke...@wikimedia.org
 wrote:

 If we can't share it with the public then it seems like it shouldn't
 be part of a proposal for an API.


 Right, to clarify, this proposal is for a public data set and API.


  Thanks Dan, and apologies if these are naive questions:
 
  For mobile web can we also see beta v. stable?  This is important for
  tracking prototypes, which is one of the core product uses for this
 data.
 
  For apps can we see ios v android?


 Jon, we chose to not include that information in order to limit the
 amount of data that we'd have to deal with.  If it gets too large, it won't
 fit into PostgreSQL.  For the iOS / Android and beta / alpha versions of
 the site we can either:

 * Make a new cube that examines site versions and client information
 * Just use the private data as we're already doing, but aggregate it
 hourly or daily as needed, to make analysis much faster.

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Oliver Keyes
My only thought is that city makes me uncomfortable. Did we track
down a precise use case for that in the end?

On 5 June 2015 at 09:25, Dan Andreescu dandree...@wikimedia.org wrote:
 I just posted a comment on the famous task:
 https://phabricator.wikimedia.org/T44259#1341010 :)

 Here it is for those who would rather discuss on this list:


 We have finished analyzing the intermediate hourly aggregate with all the
 columns that we think are interesting.  The data is too large to query and
 anonymize in real time.  We'd rather get an API out faster than deal with
 that problem, so we decided to produce smaller cubes [1] of data for
 specific purposes.  We have two cubes in mind and I'll explain those here.
 For each cube, we're aiming to have:

 * Direct access to a postgresql database in labs with the data
 * API access through RESTBase
 * Mondrian / Saiku access in labs for dimensional analysis
 * Data will be pre-aggregated so that any single data point has k-anonymity
 (we have not determined a good k yet)
 * Higher level aggregations will be pre-computed so they use all data

 And, the cubes are:

 **stats.grok.se Cube: basic pageview data**

 Hourly resolution.  Will serve the same purpose as stats.grok.se has served
 for so many years.  The dimensions available will be:

 * project - 'Project name from requests host name'
 * dialect - 'Dialect from requests path (not set if present in project
 name)'
 * page_title - 'Page Title from requests path and query'
 * access_method - 'Method used to access the pages, can be desktop, mobile
 web, or mobile app'
 * is_zero - 'accessed through a zero provider'
 * agent_type - 'Agent accessing the pages, can be spider or user'
 * referer_class - 'Can be internal, external or unknown'


 **Geo Cube: geo-coded pageview data**

 Daily resolution.  Will allow researchers to track the flu, breaking news,
 etc.  Dimensions will be:

 * project - 'Project name from requests hostname'
 * page_title - 'Page Title from requests path and query'
 * country_code - 'Country ISO code of the accessing agents (computed using
 MaxMind GeoIP database)'
 * province - 'State / Province of the accessing agents (computed using
 MaxMind GeoIP database)'
 * city - 'Metro area of the accessing agents (computed using MaxMind GeoIP
 database)'


 So, if anyone wants another cube, **now** is the time to speak up.  We'll
 probably add cubes later, but it may be a while.

 [1] OLAP cubes: https://en.wikipedia.org/wiki/OLAP_cube

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Dan Andreescu

 Gotcha. Reading that proposal it appears to be a proposal for a
 methodology that will enable future proposals; where are the future
 proposals?


Well, so the geo cube has to guess a bit at who would find it useful in the
future.


 It also says in many countries, disease monitoring must be
 carried out at the state or metro-area level - which countries have
 to be metro-level? Who are we risking the entire reader population
 for, here? Is it one country, or ten, or?

 For what it's worth I love the idea of this kind of live stream. But I
 want to make sure that how the various chunks are being prioritised,
 and how critical they are to the outside world, is correlated - and is
 correlated with the underlying data's sensitivity, at that. If we're
 introducing risks by going down to city level and the actual use cases
 for city level data are limited, let's not do that - but this proposal
 doesn't provide thoughts on how limited those use cases are. It just
 says that it's required in some countries.


I agree with you, but I'm not sure the data is risky if it's k-anonymous.
Most likely, just doing that will limit the countries for which metro level
data is available.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Oliver Keyes
Gotcha. Reading that proposal it appears to be a proposal for a
methodology that will enable future proposals; where are the future
proposals? It also says in many countries, disease monitoring must be
carried out at the state or metro-area level - which countries have
to be metro-level? Who are we risking the entire reader population
for, here? Is it one country, or ten, or?

For what it's worth I love the idea of this kind of live stream. But I
want to make sure that how the various chunks are being prioritised,
and how critical they are to the outside world, is correlated - and is
correlated with the underlying data's sensitivity, at that. If we're
introducing risks by going down to city level and the actual use cases
for city level data are limited, let's not do that - but this proposal
doesn't provide thoughts on how limited those use cases are. It just
says that it's required in some countries.

On 5 June 2015 at 09:35, Dan Andreescu dandree...@wikimedia.org wrote:
 My only thought is that city makes me uncomfortable. Did we track
 down a precise use case for that in the end?


 Yes, the Los Alamos National Lab folks' proposal:
 https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews

 We talked to them yesterday and it seems the time granularity is not as
 important.  That's why that dataset is *daily* and the other one is
 *hourly*.  Either way, these will be k-anonymized at any level.  Once we
 have some data up, though, I'd love for people who are good at this to try
 and attack the datasets in combination and from different points of view
 like t-closeness, etc.  I don't want to leak any info and any help on that
 is appreciated 'cause it's a hard problem.

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Yuri Astrakhan
Yep, the Zero tag would be very useful. Also, something zero team found
highly useful are: via (proxy value in x-analytics), https vs http, zero
sub domain vs m subdomain.
On Jun 5, 2015 14:51, Kevin Leduc ke...@wikimedia.org wrote:

 I came across another potential requirement from the WP Zero team:
 add the x-analytics['zero'] to the dimensions.  This would allow the zero
 team to get pageviews per partner carrier.  Our partners are interested in
 this data, however, they don't want to share it with anyone as it is
 competitive data, and we can't make it public.


 On Fri, Jun 5, 2015 at 10:51 AM, Jon Katz jk...@wikimedia.org wrote:

 Thanks Dan, and apologies if these are naive questions:

 For mobile web can we also see beta v. stable?  This is important for
 tracking prototypes, which is one of the core product uses for this data.

 For apps can we see ios v android?



 On Fri, Jun 5, 2015 at 8:39 AM, Oliver Keyes oke...@wikimedia.org
 wrote:

 On 5 June 2015 at 10:38, Dan Andreescu dandree...@wikimedia.org wrote:
  Gotcha. Reading that proposal it appears to be a proposal for a
  methodology that will enable future proposals; where are the future
  proposals?
 
 
  Well, so the geo cube has to guess a bit at who would find it useful
 in the
  future.
 
 
  It also says in many countries, disease monitoring must be
  carried out at the state or metro-area level - which countries have
  to be metro-level? Who are we risking the entire reader population
  for, here? Is it one country, or ten, or?
 
  For what it's worth I love the idea of this kind of live stream. But I
  want to make sure that how the various chunks are being prioritised,
  and how critical they are to the outside world, is correlated - and is
  correlated with the underlying data's sensitivity, at that. If we're
  introducing risks by going down to city level and the actual use cases
  for city level data are limited, let's not do that - but this proposal
  doesn't provide thoughts on how limited those use cases are. It just
  says that it's required in some countries.
 
 
  I agree with you, but I'm not sure the data is risky if it's
 k-anonymous.
  Most likely, just doing that will limit the countries for which metro
 level
  data is available.

 I don't think it is if it is! As you said, though, we need to hammer
 on it for a while to make absolutely sure it's okay, and using
 lower-resolution data would not only make this easier but also reduce
 the cost of getting people wrong (geolocating people to MA is less
 dangerous than geolocating them to Arlington)

 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Kevin Leduc
I came across another potential requirement from the WP Zero team:
add the x-analytics['zero'] to the dimensions.  This would allow the zero
team to get pageviews per partner carrier.  Our partners are interested in
this data, however, they don't want to share it with anyone as it is
competitive data, and we can't make it public.


On Fri, Jun 5, 2015 at 10:51 AM, Jon Katz jk...@wikimedia.org wrote:

 Thanks Dan, and apologies if these are naive questions:

 For mobile web can we also see beta v. stable?  This is important for
 tracking prototypes, which is one of the core product uses for this data.

 For apps can we see ios v android?



 On Fri, Jun 5, 2015 at 8:39 AM, Oliver Keyes oke...@wikimedia.org wrote:

 On 5 June 2015 at 10:38, Dan Andreescu dandree...@wikimedia.org wrote:
  Gotcha. Reading that proposal it appears to be a proposal for a
  methodology that will enable future proposals; where are the future
  proposals?
 
 
  Well, so the geo cube has to guess a bit at who would find it useful in
 the
  future.
 
 
  It also says in many countries, disease monitoring must be
  carried out at the state or metro-area level - which countries have
  to be metro-level? Who are we risking the entire reader population
  for, here? Is it one country, or ten, or?
 
  For what it's worth I love the idea of this kind of live stream. But I
  want to make sure that how the various chunks are being prioritised,
  and how critical they are to the outside world, is correlated - and is
  correlated with the underlying data's sensitivity, at that. If we're
  introducing risks by going down to city level and the actual use cases
  for city level data are limited, let's not do that - but this proposal
  doesn't provide thoughts on how limited those use cases are. It just
  says that it's required in some countries.
 
 
  I agree with you, but I'm not sure the data is risky if it's
 k-anonymous.
  Most likely, just doing that will limit the countries for which metro
 level
  data is available.

 I don't think it is if it is! As you said, though, we need to hammer
 on it for a while to make absolutely sure it's okay, and using
 lower-resolution data would not only make this easier but also reduce
 the cost of getting people wrong (geolocating people to MA is less
 dangerous than geolocating them to Arlington)

 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Jon Katz
Thanks Dan, and apologies if these are naive questions:

For mobile web can we also see beta v. stable?  This is important for
tracking prototypes, which is one of the core product uses for this data.

For apps can we see ios v android?



On Fri, Jun 5, 2015 at 8:39 AM, Oliver Keyes oke...@wikimedia.org wrote:

 On 5 June 2015 at 10:38, Dan Andreescu dandree...@wikimedia.org wrote:
  Gotcha. Reading that proposal it appears to be a proposal for a
  methodology that will enable future proposals; where are the future
  proposals?
 
 
  Well, so the geo cube has to guess a bit at who would find it useful in
 the
  future.
 
 
  It also says in many countries, disease monitoring must be
  carried out at the state or metro-area level - which countries have
  to be metro-level? Who are we risking the entire reader population
  for, here? Is it one country, or ten, or?
 
  For what it's worth I love the idea of this kind of live stream. But I
  want to make sure that how the various chunks are being prioritised,
  and how critical they are to the outside world, is correlated - and is
  correlated with the underlying data's sensitivity, at that. If we're
  introducing risks by going down to city level and the actual use cases
  for city level data are limited, let's not do that - but this proposal
  doesn't provide thoughts on how limited those use cases are. It just
  says that it's required in some countries.
 
 
  I agree with you, but I'm not sure the data is risky if it's k-anonymous.
  Most likely, just doing that will limit the countries for which metro
 level
  data is available.

 I don't think it is if it is! As you said, though, we need to hammer
 on it for a while to make absolutely sure it's okay, and using
 lower-resolution data would not only make this easier but also reduce
 the cost of getting people wrong (geolocating people to MA is less
 dangerous than geolocating them to Arlington)

 
  ___
  Analytics mailing list
  Analytics@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Oliver Keyes
On 5 June 2015 at 10:38, Dan Andreescu dandree...@wikimedia.org wrote:
 Gotcha. Reading that proposal it appears to be a proposal for a
 methodology that will enable future proposals; where are the future
 proposals?


 Well, so the geo cube has to guess a bit at who would find it useful in the
 future.


 It also says in many countries, disease monitoring must be
 carried out at the state or metro-area level - which countries have
 to be metro-level? Who are we risking the entire reader population
 for, here? Is it one country, or ten, or?

 For what it's worth I love the idea of this kind of live stream. But I
 want to make sure that how the various chunks are being prioritised,
 and how critical they are to the outside world, is correlated - and is
 correlated with the underlying data's sensitivity, at that. If we're
 introducing risks by going down to city level and the actual use cases
 for city level data are limited, let's not do that - but this proposal
 doesn't provide thoughts on how limited those use cases are. It just
 says that it's required in some countries.


 I agree with you, but I'm not sure the data is risky if it's k-anonymous.
 Most likely, just doing that will limit the countries for which metro level
 data is available.

I don't think it is if it is! As you said, though, we need to hammer
on it for a while to make absolutely sure it's okay, and using
lower-resolution data would not only make this easier but also reduce
the cost of getting people wrong (geolocating people to MA is less
dangerous than geolocating them to Arlington)


 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Dan Andreescu
On Fri, Jun 5, 2015 at 3:09 PM, Oliver Keyes oke...@wikimedia.org wrote:

 If we can't share it with the public then it seems like it shouldn't
 be part of a proposal for an API.


Right, to clarify, this proposal is for a public data set and API.


  Thanks Dan, and apologies if these are naive questions:
 
  For mobile web can we also see beta v. stable?  This is important for
  tracking prototypes, which is one of the core product uses for this
 data.
 
  For apps can we see ios v android?


Jon, we chose to not include that information in order to limit the amount
of data that we'd have to deal with.  If it gets too large, it won't fit
into PostgreSQL.  For the iOS / Android and beta / alpha versions of the
site we can either:

* Make a new cube that examines site versions and client information
* Just use the private data as we're already doing, but aggregate it hourly
or daily as needed, to make analysis much faster.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics