Re: [Analytics] API usage advice

2016-04-29 Thread Kevin Leduc
Hi Sander,

Eric shared the link for media files.  If you want [tons] of pageview data,
it can be downloaded here:
https://dumps.wikimedia.org/other/analytics/

There is also a pageview API if you are looking for view counts to specific
articles (thus avoiding downloads of tons of data):
https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI



On Thu, Apr 28, 2016 at 6:32 AM, Erik Zachte  wrote:

> Hi Sander,
>
>
>
> Not an API but probably relevant, this data stream on media (binary file)
> downloads:
>
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts
>
>
>
> Cheers,
>
> Erik Zachte
>
>
>
> *From:* Analytics [mailto:analytics-boun...@lists.wikimedia.org] *On
> Behalf Of *Sander Ubink
> *Sent:* Thursday, April 28, 2016 14:28
> *To:* analytics@lists.wikimedia.org
> *Subject:* [Analytics] API usage advice
>
>
>
> Hi all,
>
>
>
> as a new subscriber to this mailing list I would like to introduce myself.
> My name is Sander, I'm a student and I'm currently working on a project
> using Wikimedia APIs. A Dutch cultural institution has requested a team of
> students to analyze how their uploaded material is being used. Some
> examples of what they're interested in is knowing where their material is
> reused, how many visitors view to pages, if visitors open the media on the
> page, etc. Which APIs would you suggest we should look at that could have
> valuable information? Also, is there any general documentation about the
> various APIs? Any advice would be greatly appreciated.
>
>
>
> Regards,
>
> Sander
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Unique Devices data available on API

2016-04-19 Thread Kevin Leduc
Here's another useful link to a form that helps you construct the API call:
https://wikimedia.org/api/rest_v1/?doc#!/Unique_devices_data/get_metrics_unique_devices_project_access_site_granularity_start_end


On Tue, Apr 19, 2016 at 12:17 PM, Nuria Ruiz  wrote:

> Hello!
>
> The analytics team is happy to announce that the Unique Devices data is
> now available to be queried programmatically via an API.
>
> This means that getting the daily number of unique devices [1] for English
> Wikipedia for the month of February 2016, for all sites (desktop and
> mobile) is as easy as launching this query:
>
>
> https://wikimedia.org/api/rest_v1/metrics/unique-devices/en.wikipedia.org/all-sites/daily/20160201/20160229
>
> You can get started by taking a look at our docs:
> https://wikitech.wikimedia.org/wiki/Analytics/Unique_Devices#Quick_Start
>
> If you are not familiar with the Unique Devices data the main thing you
> need to know is that
> is a good proxy metric to measure Unique Users, more info below.
>
> Since 2009, the Wikimedia Foundation used comScore to report data about
> unique web visitors.  In January 2016, however, we decided to stop
> reporting comScore numbers [2] because of certain limitations in the
> methodology, these limitations translated into misreported mobile usage. We
> are now ready to replace comscore numbers with the Unique Devices Dataset .
> While unique devices does not equal unique visitors, it is a good proxy for
> that metric, meaning that a major increase in the number of unique devices
> is likely to come from an increase in distinct users. We understand that
> counting uniques raises fairly big privacy concerns and we use a very
> private conscious way to count unique devices, it does not include any
> cookie by which your browser history can be tracked [3].
>
>
> [1] https://meta.wikimedia.org/wiki/Research:Unique_Devices
> [2] [https://meta.wikimedia.org/wiki/ComScore/Announcement
> [3]
> https://meta.wikimedia.org/wiki/Research:Unique_Devices#How_do_we_count_unique_
> devices.3F
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Researcher Student

2016-04-11 Thread Kevin Leduc
I think Nima was referring to articles of monuments / places of interest
that have GPS coordinates in them.  For example, the Trevi Fountain is at
these coordinates: 41.902773°N 12.485952°E

by joining pageviews and coordinate data, you could create heat maps that
may correlate with actual tourist traffic.


[1] https://it.wikipedia.org/wiki/Trevi_(rione_di_Roma)


On Tue, Apr 5, 2016 at 4:07 AM, Nima Dashtban 
wrote:

> Hi there,
>
> Hope my email finds you well. My name is Nima Dashtban and I'm a student
> of computer science in Ca'foscari University of Venice / Italy.
>
> I am investigating these access logs of wikipedia pages:
> https://dumps.wikimedia.org/other/pagecounts-raw/
>
> In particular I would like to build up an DB of the time series of
> accesses to (Italian) pages of wikipedia that have a GPS position, i.e.
> wikipedia page that refer to geographical point of interests. I think that
> such data could be useful as predictive signal of interest of potential
> visitors of such geographical places.
>
> Any help of you whether you say it is possible or not would be huge for me.
>
> Sincerely and Regards,
> Nima Dashtban
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Researcher Student

2016-04-07 Thread Kevin Leduc
Hi Nima,

It should be possible, and it is interesting to merge geodata with
pageviews.  Newer pageview data may be easier to work with:
https://dumps.wikimedia.org/other/analytics/

I wonder if the timing when GPS data became available in an article has any
impact on pageviews.  It may be easier to assume that is not the case so
you don't have to look at article's history as well.

Wikidata will also be an easy way to query for GPS data.  Check out this
mapping of data with coordinates:
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en



On Tue, Apr 5, 2016 at 4:07 AM, Nima Dashtban 
wrote:

> Hi there,
>
> Hope my email finds you well. My name is Nima Dashtban and I'm a student
> of computer science in Ca'foscari University of Venice / Italy.
>
> I am investigating these access logs of wikipedia pages:
> https://dumps.wikimedia.org/other/pagecounts-raw/
>
> In particular I would like to build up an DB of the time series of
> accesses to (Italian) pages of wikipedia that have a GPS position, i.e.
> wikipedia page that refer to geographical point of interests. I think that
> such data could be useful as predictive signal of interest of potential
> visitors of such geographical places.
>
> Any help of you whether you say it is possible or not would be huge for me.
>
> Sincerely and Regards,
> Nima Dashtban
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Easier way to get and/or work with page counts?

2016-02-29 Thread Kevin Leduc
Hi Dominic,

There's an API you can use to get pageviews [1].  If you are using Python,
JS or R, there are convenient libraries that make it even easier [2].

The Wikimedia Foundation does not maintain stats.grok.se, it was built
years ago and the server has not been very reliable lately.  The new API
maintained by the foundation is the best way to go.

[1] https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI
[2] https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Clients


On Mon, Feb 29, 2016 at 7:33 AM, Oliver Keyes  wrote:

> There are also R (https://github.com/Ironholds/pageviews) and Python
> (https://github.com/mediawiki-utilities/python-mwviews) clients
> depending on your language of preference :)
>
> On 29 February 2016 at 08:40, Thomas Steiner  wrote:
> > Hi Dominic,
> >
> > You might be interested in Pageviews.js:
> > https://github.com/tomayac/pageviews.js.
> >
> > Cheers,
> > Tom
> >
> >
> > --
> > Dr. Thomas Steiner, Employee (blog.tomayac.com, twitter.com/tomayac)
> >
> > Google Germany GmbH, ABC-Str. 19, 20354 Hamburg
> > Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
> > Registergericht und -nummer: Hamburg, HRB 86891
> >
> > -BEGIN PGP SIGNATURE-
> > Version: GnuPG v2.0.29 (GNU/Linux)
> >
> >
> iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom.hTtP5://xKcd.c0m/1181/
> > -END PGP SIGNATURE-
> >
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] [Interesting] n-gram analysis or reddit

2016-02-20 Thread Kevin Leduc
I cam across a data visualizer that looks a lot like the pageview analysis
tool [1].  It shows the frequency of words in comments on reddit.com: the
n-gram visualizer [2].  If only that dataset was public ;-)


[1]
https://tools.wmflabs.org/pageviews/#start=2016-01-31&end=2016-02-19&project=en.wikipedia.org&platform=all-access&agent=user&pages=Cat|Dog

[2]
http://projects.fivethirtyeight.com/reddit-ngram/?keyword=global_warming.climate_change&start=20071014&end=20150831&smoothing=30
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Vital signs no longer displaying legacy pageview data

2016-01-22 Thread Kevin Leduc
I often consult vital signs (usually leading up to the monthly metrics
meeting) to see how our traffic is doing.  I'm always hoping it is bouncing
back up and it's something that could be presented at the metrics meeting.
I wish Vital Signs had monthly pageview data.   I know I can use the
smoothing function... but I'm looking for a quick lookup of a number I can
report back: e.g. last month we had 15 billion pageviews.  The reading team
is running hive queries to get this and it's one of their KPIs it:
https://www.mediawiki.org/wiki/Wikimedia_Product#Reading .

Anyway, I think the ideal would be to have druid + a visualization package
instead of vital signs and then I could filter for exactly what I want, and
I wouldn't have to ask if includes bots ;-)


On Fri, Jan 22, 2016 at 1:51 AM, Joseph Allemandou <
jalleman...@wikimedia.org> wrote:

> Hi Kevin
>
> I confirm that Vital Sign pageviews DO NOT contain the traffic we flag as
> automated (see the code here
> <https://github.com/wikimedia/analytics-refinery/blob/master/oozie/projectview/hourly/transform_projectview_to_legacy_format.hql#L79>
> )
> Joseph
>
> On Fri, Jan 22, 2016 at 5:41 AM, Kevin Leduc  wrote:
>
>> Can you confirm that Vital Signs pageviews include web crawler and bot
>> traffic?
>>
>> On Thu, Jan 21, 2016 at 5:16 PM, Nuria Ruiz  wrote:
>>
>>> Hello,
>>>
>>> The UI for vital signs will no longer display legacy pageview data
>>> (pageviews calculations that used the old definition). We are working
>>> towards having one consistent pageview definition [1] in every tool that
>>> surfaces pageview data.
>>>
>>>
>>> Please have in mind that the new definition only exists since May 2015
>>> any data from before the switch was made was calculated using the old
>>> (undocumented as far as we know) definition.
>>>
>>> You can access vital signs UI in the following url:
>>>
>>> https://vital-signs.wmflabs.org/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics=Pageviews
>>>
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>>
>>> [1] https://meta.wikimedia.org/wiki/Research:Page_view
>>>
>>>
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> *Joseph Allemandou*
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Vital signs no longer displaying legacy pageview data

2016-01-21 Thread Kevin Leduc
Can you confirm that Vital Signs pageviews include web crawler and bot
traffic?

On Thu, Jan 21, 2016 at 5:16 PM, Nuria Ruiz  wrote:

> Hello,
>
> The UI for vital signs will no longer display legacy pageview data
> (pageviews calculations that used the old definition). We are working
> towards having one consistent pageview definition [1] in every tool that
> surfaces pageview data.
>
>
> Please have in mind that the new definition only exists since May 2015 any
> data from before the switch was made was calculated using the old
> (undocumented as far as we know) definition.
>
> You can access vital signs UI in the following url:
>
> https://vital-signs.wmflabs.org/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics=Pageviews
>
>
> Thanks,
>
> Nuria
>
>
> [1] https://meta.wikimedia.org/wiki/Research:Page_view
>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] top articles script

2016-01-20 Thread Kevin Leduc
+Analytics list so they can comment.

I don't have such a script.  It's a pretty intensive job to compile top
articles especially over a month.  The pageview API was supposed to have
top articles per month per wiki but the job is so massive that it failed to
run in Hive.  Analytics knows there are better algorithms out there to
solve this problem.  So the pageview API just has top per day per wiki.

I imagine that you are looking at some very specific wikis and countries...
not all of them.  Maybe someone on the list can make an example hive script
(given a wiki and country) that gives the top for a day.


On Wed, Jan 20, 2016 at 12:23 PM, Dan Foy  wrote:

> Hi Kevin,
>
> In your collection of scripts for Hive, do you have one that can act as a
> starting point for me to get the top N articles / URLs for Wikipedia in a
> country?
>
> Thanks,
> Dan
>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] 2015 accomplishments by Analytics@Wikimedia

2015-12-23 Thread Kevin Leduc
Hi All,

It has been my pleasure and pride to manage the Analytics Team @ Wikimedia
these past 9 months.  Below are slides and video presentations of some of
our greatest accomplishments in 2015.  BTW I will still be around in a new
capacity managing special projects starting with socializing and defining
new engagement metrics for wikipedia.  A blogpost will be out in February
2016.


Here are the 2015 highlights:

New aggregated pageview dataset

Slides on Commons


Presentation on YouTube
 (7 minutes)





Pageview API

Slides on Commons 

Presentation on YouTube
 (9 minutes)





EventLogging + Kafka

Slides on Commons


Presentation on YouTube
 (11 minutes)




EventLogging Data Retention

Slides on Commons


Presentation on YouTube
 (11 minutes)


Dashiki: dashboards configured on-wiki

Slides on Google Docs


Presentation on YouTube
 (10 minutes)
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Announcing the pageview API

2015-12-14 Thread Kevin Leduc
Hi All,

It's official: we have a pageview API.  You can read more about it on
*Wikipedia's
blog*
http://blog.wikimedia.org/2015/12/14/pageview-data-easily-accessible/

You can help us spread the word via
*Twitter* https://twitter.com/Wikipedia/status/676511422902218752
or *Facebook* https://www.facebook.com/wikipedia/posts/10153697573088346

Congratulations Analytics Team!
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Monthly page view ranking

2015-12-01 Thread Kevin Leduc
Hi Anson,

The API should give top views for a month... but the data isn't available
yet:
https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2015/10/all-days

I believe we'll have October and November shortly - I'm going to find out
when by tomorrow.

On Wed, Nov 25, 2015 at 11:07 AM, Ying Haw Lee  wrote:

> Hey guys,
>
> Good afternoon. I am working on a data-mining project on Wikipedia data.
> Our main question is whether the number of page view is correlated with the
> number of reverted edits.In order to have a fair comparison between
> different portal (People, Technology, Math and etc) , I would like to get
> the 250 most viewed pages for each portal (12 in total).
>
> I notice that on Wikipedia, there is a page where weekly page view data
> are aggregated and the 5000 most viewed pages of the week are listed.
>
> However, in order to see the behavior between the number of page views and
> the number of reverted edits, I would need data for a longer duration (Say
> a month or longer).
>
> I wonder if there is any way (hopefully easy way) that I can query the
> data I need.
>
> Your help is highly appreciated.
>
> Happy Thanksgiving!
>
> Anson
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Transitioning wikistats pageview reports to use new pageview definition

2015-11-10 Thread Kevin Leduc
\o/

On Tue, Nov 10, 2015 at 1:59 PM, Nuria Ruiz  wrote:

> Hello!
>
> The analytics team wishes to announce that we have finally transitioned
> several of the pageview reports in stats.wikimedia.org  to the new
> pageview definition [1]. This means that we should no longer have two
> conflicting sources of pageview numbers.
>
> While we are not not fully done transitioning pageview reports we feel
> this is an important enough milestone that warrants some communication. BIG
> Thanks to Erik Z. for his work on this project.
>
> Please take a look at a report using the new definition (a banner is
> present when report has been updated)
> http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
>
> Thanks,
>
> Nuria
>
>
> [1] https://meta.wikimedia.org/wiki/Research:Page_view
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] A belated project completion shout-out

2015-10-29 Thread Kevin Leduc
I was grooming our Kanban board and believe it's time to close four
projects (epic tasks) which we completed last quarter in support of our
quarterly goals.  It's time to mark these tasks as resolved and think back
on them fondly.

Project: *Pageviews in Vital Signs*
Animal code name: *musk*
Result: *https://vital-signs.wmflabs.org/
*
Ticket: *https://phabricator.wikimedia.org/T101120
*

Project: *Total Pageview count in Vital Signs*
Animal code name: *wren*
Result: *https://vital-signs.wmflabs.org/#projects=all/metrics=Pageviews
*
Ticket: *https://phabricator.wikimedia.org/T96314
*

Project: *Hadoop Cluster Expansion*
Animal code name: *mule*
Result: *https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hardware
*
Ticket: *https://phabricator.wikimedia.org/T99952
*

Project: *EventLogging on Kafka*
Animal code name: *stag*
Result: 
*https://www.mediawiki.org/wiki/File:EventLogging_on_Kafka_-_Lightning_Talk.pdf
*
Ticket: *https://phabricator.wikimedia.org/T102225
*


(Photo by Alex Sims
)
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Gerrit Cleanup Day on Wed 23rd: Are you ready?

2015-09-18 Thread Kevin Leduc
Will someone on the analytics team step up to be the lead for Analytics on
Gerrit Cleanup Day?

On Thu, Sep 17, 2015 at 9:10 AM, Andre Klapper 
wrote:

> Hi Analytics,
>
> the Gerrit Cleanup Day on Wed 23rd is approaching fast - less than one
> week left. More info: https://phabricator.wikimedia.org/T88531
>
> Do you feel prepared for the day and all team members know what to do?
>
> If not, what are you missing and how can we help?
>
> Some Gerrit queries for each team are listed under "Gerrit queries per
> team/area" in https://phabricator.wikimedia.org/T88531
> Are they helpful and a good start? Or do they miss some areas (or do
> you have existing Gerrit team queries to use instead or to "integrate",
> e.g. for parts of MediaWiki core you might work on)?
>
> Also, which person will be the main team contact for the day (and
> available in #wikimedia-dev on IRC) and help organize review work in
> your areas, so other teams could easily reach out?
> Some team plates are emptier than others so they're wondering where and
> how to lend a helping hand (to find out in advance, due to timezones).
>
> Thanks for your help to make the Gerrit Cleanup day a success!
>
> andre
> --
> Andre Klapper | Wikimedia Bugwrangler
> http://blogs.gnome.org/aklapper/
>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Vital Signs dashboard

2015-08-24 Thread Kevin Leduc
I'm not sure I understand "display no data".  Most metrics have data
starting Jan 2015.

There are some gaps for the metrics on editors on larger wikis.
Wikimetrics times out when it tries to query for those data points (they
are intense queries).

Our longer term plan is to move data from the wiki DBs into Hadoop so we
can leverage that platform to do the calculations.  It's in our tentative
goals for early 2016.

We should talk though... there may be some bridges we can build.


On Mon, Aug 24, 2015 at 4:19 PM, Neil P. Quinn  wrote:

> Hello all!
>
> Almost all of the graphs on the Vital Signs dashboard
>  display no data (the only exception is
> legacy pageviews). Could someone explain to me why that is, and whether
> there's a plan to fix it?
>
> I ask because Vital Signs includes several metrics from the editor model,
> which the Editing
> department really wants to track on an ongoing basis. I need to find out
> whether we need to pursue other ways of doing so.
>
> Thanks!
> --
> Neil P. Quinn ,
> product analyst
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageviews definition + measurement for apps adding link previews + using RESTBase

2015-08-18 Thread Kevin Leduc
We briefly considered counting views of Hover Cards as Pageviews, but it
was quickly dismissed.  First, the feature is not widely used enough to
justify Changing the pageview definition.

I'm still open to counting previews as pageviews, but I think the
Readership team and their product managers need to weigh in heavily as
Pageviews is a key metric for them.

Finally, counting Pageviews served through RESTBase sounds like a new
project and I'd like to hear more about the effort needed from the
analytics engineers.


On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes  wrote:

> On 18 August 2015 at 19:11, Bernd Sitzmann  wrote:
> > This discussion is about needed updates of the definition and Analytics
> > implementation for mobile apps page view metrics. There is also an
> > associated Phab task[4]. Please add the proper Analytics project there.
> >
> > Background / Changes
> >
> > As you probably remember, the Android app splits a page view into two
> > requests: one for the lead section and metadata, plus another one for the
> > remainder.
> >
> > The mobile apps are going to change the way they load pages in two
> different
> > ways:
> >
> > We'll add a link preview when someone clicks on a link from a page.
> > We're planning on switching over the using RESTBase for loading pages and
> > also the link preview (initially just the Android beta, ater more)
> >
>
> Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
> service API?
>
> Last time I checked that wasn't even consumed by HDFS. Is it now being
> consumed by HDFS?
>
> More importantly the actual URLs are going to look /totally/
> different. If we do not include RESTBase requests, we will miss the
> apps. If we /do/ include RESTBase requests we will not only have to
> rewrite the pageview definition for the apps to recognise the new URL
> scheme, we will also potentially have to rewrite every /other/ bit of
> the definition to /not/ incorporate those requests.
>
> (I use "we" in a collective sense. This isn't my baby any more,
> although if Joseph et al want help with the refactor here I'm happy to
> spend my volunteer time on it).
>
> But basically every other bit of your email is important but now
> secondary: this is a potentially massive change, all on its own, even
> without the link preview, even if the substance of the requests going
> to RESTBase were identical.
>
> > This will have implications for the pageviews definition and how we count
> > user engagement.
> >
> > The big question is
> >
> > Should we count link previews as a page view since it's an indication of
> > user engagement? Or should there be a separate metric for link previews?
> >
> > Counting page views
> >
> > IIRC we currently count action=mobileview§ions=0 query parameters of
> > api.php as a page view. When we publish link previews for all Android app
> > users then we would either want to count also the calls to
> > action=query&prop=extracts as a page view or add them to another metric.
> >
> > Once the apps use RESTBase the HTTPS requests will be very different:
> >
> > Page view: Instead of action=mobileview§ions=0 the app would call the
> > RESTBase endpoint for lead request[1] instead of the PHP API mentioned
> > above. Then it would call [2].
> > Link preview: Instead of action=query&prop=extracts it would call the
> lead
> > request[1], too, since there is a lot of overlap. At least that our
> current
> > plan. The advantage of that is that the client doesn't need to execute
> the
> > lead request a second time if the user clicks on the link preview (--
> either
> > through caching or app logic.)
> >
> > So, in the RESTBase case we either want to count the
> > mobile-html-sections-lead requests or the mobile-html-sections-remaining
> > requests depending on what our definition for page views actually is. We
> > could also add a query parameter or extra HTTP header to one of the
> > mobile-html-sections-lead requests if we need to distinguish between
> > previews and page views.
> >
> > Both the current PHP API and the RESTBase based metrics would need to be
> > compatible and be collected in parallel since we cannot control when
> users
> > update their apps.
> >
> > [1]
> >
> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> > [2]
> >
> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert
> > [3]
> >
> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
> >
> > [4] https://phabricator.wikimedia.org/T109383
> >
> >
> > Cheers,
> >
> > Bernd
> >
> >
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

Re: [Analytics] pageviews_hourly table

2015-08-17 Thread Kevin Leduc
Tilman, to answer your question, the presentation of analytics at Monthly
Metrics Meetings will change month to month.  Next month I am on vacation
so I have asked Jon to present something.  I'm assuming it will have
Pageviews and be readership focused - it's up to Jon.


On Mon, Aug 17, 2015 at 4:16 PM, Oliver Keyes  wrote:

> This seems perfect. Is it currently used?
>
> On 17 August 2015 at 18:03, Andrew Otto  wrote:
> > BTW, Christian foresaw this issue and wrote this:
> > https://github.com/wikimedia/analytics-refinery-source/tree/master/guard
> >
> > It should be useable for pageviews too, I think.  For this issue, a
> guard that made sure that outreach.wikimedia.org never appeared would
> have been an error.
> >
> >
> >
> >
> >
> >> On Aug 17, 2015, at 14:45, Oliver Keyes  wrote:
> >>
> >> On 17 August 2015 at 13:48, Joseph Allemandou <
> jalleman...@wikimedia.org> wrote:
> >>> Hey Oliver,
> >>>
> >>> The analytics team is responsible for the pageview definition.
> >>> When finding issues, sending an email to the analytics mailing list is
> the
> >>> right thing to do :)
> >>>
> >>
> >> Indeed; my point is not about issues reported upstream. My point is
> >> that there appears to currently be absolutely no work done to take
> >> this (org-level, highest possible priority) KPI and evaluate it every
> >> month or ever N days to make sure that, even with the gradual
> >> accretion of changes to the input data, it is still extracting what we
> >> want. It is down to user-reported issues. The problem with this
> >> approach is that after 90 days it is impossible to rerun the data; if
> >> there is a bug breaking the logs, and it takes more than 90 days to
> >> discover it, those logs are simply broken.
> >>
> >> In addition, discovering these issues requires a very granular
> >> understanding of what the pageviews logs are meant to be capturing
> >> that most customers simply will not have. It worked in this case
> >> primarily because the customer actually /wrote/ the definition ;p.
> >>
> >> For public transparency: Joseph and I talked on IRC and will be
> >> working on ways to validate data and detect these kinds of regressions
> >> in advance.
> >>
> >>> On our end, we could surely do a better job to communicate changes in
> the
> >>> pageview definition code for anybody interested to review/comment/ask
> for
> >>> documentation.
> >>> Emails have been sent regularly about updates on the analytics list,
> except
> >>> in the past few month.
> >>> We shall get back to that good habit and send notifications with
> >>> explanations of the changes.
> >>>
> >>> Joseph
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Aug 17, 2015 at 5:15 PM, Oliver Keyes 
> wrote:
> 
>  You should also note that donate-wiki pageviews are making it into the
>  counts (again, the definition was designed to exclude these).
> 
>  Whose job is it to review pageviews and update the definition when
>  issues are found?
> 
>  On 17 August 2015 at 10:32, Oliver Keyes 
> wrote:
> > Just to clarify; there is no need to ask me before making changes
> > (obviously I find my approval for pageviews changes being sought
> > incredibly flattering, but I am not the only person involved in this
> > project ;p). What I'm more driving towards is directly informing
> > customers when the definition is adapted.
> >
> > On 17 August 2015 at 10:31, Oliver Keyes 
> wrote:
> >> Excellent; thank you.
> >>
> >> On 17 August 2015 at 04:42, Joseph Allemandou
> >>  wrote:
> >>> Oliver,
> >>>
> >>> It was a mistake from me to add the 'outreach' subdomain without
> >>> asking you.
> >>>
> >>> From a documentation perspective, the analytics team uses that
> place
> >>> to
> >>> document changes:
> >>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest and
> I
> >>> didn't
> >>> know about up-to-date documentation you sent.
> >>>
> >>> Tickets have been created to both correct the bug and update the
> >>> documentation pages.
> >>>
> >>> Joseph
> >>>
> >>>
> >>>
> >>> On Sun, Aug 16, 2015 at 8:47 PM, Oliver Keyes <
> oke...@wikimedia.org>
> >>> wrote:
> 
>  Ah, I see the problem; someone patched it and never documented it.
> 
>  We have documentation at
> 
> 
> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>  of the generalised filters. There is also a log, on
>  https://meta.wikimedia.org/wiki/Research:Page_view, of changes
> to the
>  pageview definition.
> 
>  The intent behind both the transparent definition and the log is
> to
>  ensure that we know what is going /in/ the definition.
> 
>  In this case, somebody has patched the definition
> 
> 
>  (
> https://github.com/wikimedia/analytics-refinery-source/commit/cc0b6ed7e4f403eaa82235ec6a

Re: [Analytics] Request for three viewership statistics

2015-07-07 Thread Kevin Leduc
Hi Pine,

At this time, we do not have any means of counting unique visitors to a
particular page or image.  We are evaluating how and why we should do this
along with implications to privacy and the community, but this is a big
hairy project.

Let us know if you were successful getting image view counts.

Kevin Leduc

On Mon, Jul 6, 2015 at 5:29 PM, Pine W  wrote:

> Hi WMF Analytics,
>
> We have a request at Cascadia Wikimedians User Group. Can you determine:
>
> (1) How many unique users saw this geonotice:
> https://en.wikipedia.org/wiki/Wikipedia:Geonotice#Seattle_Wiki-picnic_2015
> ?
>
> (2) During the past 90 days or so, how many unique users have viewed
> https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png
> on the various Wikimedia pages where it's included?
>
> (2) During the past 90 days or so, how many times has
> https://en.wikipedia.org/wiki/File:Cascadiawikimedians_transparent_Gill_Sans_155px_high.png
> been viewed on the various Wikimedia pages where it's included?
>
> Thanks,
>
> Pine
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Fwd: Wikipedia Page views access

2015-06-19 Thread Kevin Leduc
+ Ariel

Hi Ariel, can you comment on the 503 errors happening sometimes while
trying to download data from the dumps?



On Fri, Jun 19, 2015 at 1:01 AM, Dario Taraborelli <
dtarabore...@wikimedia.org> wrote:

> Forwarding a note from Ashok Rao (cc’ed), can anyone comment on the dumps
> server returning 503s?
>
> Ashok – we don’t have yet an in-house API to retrieve pageview data, but
> the Analytics team is working on one: see this thread
> .
> Depending on what you’re doing, http://stats.grok.se/ may also come in
> handy.
>
> Best,
> Dario
>
> Begin forwarded message:
>
> *From: *Ashok Rao 
> *Subject: **Wikipedia Page views access*
> *Date: *June 18, 2015 at 5:53:12 PM GMT+2
> *To: *da...@wikimedia.org
>
> Hi Dario,
>
> Good morning. I'm a student at the University of Pennsylvania and I've
> been trying to perform a few analyses based on Wikipedia page views data.
> I've written a script that grabs data from the main dump site –
> https://dumps.wikimedia.org/other/pagecounts-raw/ – but run into many
> sporadic 503 errors (sometimes with the download link, other times with the
> main page itself). I noticed some of this data might be available directly
> on Wikimedia servers that can be utilized for research purposes.
>
> I was hoping I could get access to this and appreciate your help.
>
> Best,
> Ashok
>
> --
> Ashok M. Rao
> The Rajendra and Neera Singh Program in Market and Social Systems
> Engineering
> School of Engineering and Applied Sciences
> University of Pennsylvania | Class of '17
>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-14 Thread Kevin Leduc
In light of the recent switch to use HTTPS, what about adding http/https
information.  Maybe it can be added to the 'access_method' rather than
adding a new dimension?


On Thu, Jun 11, 2015 at 1:46 PM, Jon Katz  wrote:

> Hi Dan,
> Sorry for the late response to this--
>
> ** Make a new cube that examines site versions and client information*
> ** Just use the private data as we're already doing, but aggregate it
> hourly or daily as needed, to make analysis much faster.*
>
> How can I help add/keep this to/on your roadmap?
> -J
>
> On Fri, Jun 5, 2015 at 12:28 PM, Dan Andreescu 
> wrote:
>
>> On Fri, Jun 5, 2015 at 3:09 PM, Oliver Keyes 
>> wrote:
>>
>>> If we can't share it with the public then it seems like it shouldn't
>>> be part of a proposal for an API.
>>>
>>
>> Right, to clarify, this proposal is for a public data set and API.
>>
>>
>>> >>> Thanks Dan, and apologies if these are naive questions:
>>> >>>
>>> >>> For mobile web can we also see beta v. stable?  This is important for
>>> >>> tracking prototypes, which is one of the core product uses for this
>>> data.
>>> >>>
>>> >>> For apps can we see ios v android?
>>>
>>
>> Jon, we chose to not include that information in order to limit the
>> amount of data that we'd have to deal with.  If it gets too large, it won't
>> fit into PostgreSQL.  For the iOS / Android and beta / alpha versions of
>> the site we can either:
>>
>> * Make a new cube that examines site versions and client information
>> * Just use the private data as we're already doing, but aggregate it
>> hourly or daily as needed, to make analysis much faster.
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Metrics about the external use of the Wikimedia APIs

2015-06-11 Thread Kevin Leduc
Hi Quim,  thanks for creating a phab task.  I'll add it to our project list
as well.

Just to confirm, you are talking about the use of this API:
http://www.mediawiki.org/wiki/API:Main_page ?



On Wed, Jun 10, 2015 at 11:34 PM, Quim Gil  wrote:

> I have been asking this question informally for too long, so here goes the
> formal request:
>
> Metrics about the external use of the Wikimedia APIs
> https://phabricator.wikimedia.org/T102079
>
> We need them and, in fact, an outsider would be very surprised by the fact
> that we don't have them today and we are not looking at them regularly,
> just like we check page views and edits.
>
> It is a vague goal in a bumpy road, but I'm happy contributing at east
> questions about the metrics we need. The Engineering Community team wants
> to have this metric as main measurement of success of our performance (the
> more Wikimedia knowledge being spread and improved via our API, the better
> we are doing working with developers).
>
> --
> Quim Gil
> Engineering Community Manager @ Wikimedia Foundation
> http://www.mediawiki.org/wiki/User:Qgil
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Pageview API Status update

2015-06-05 Thread Kevin Leduc
I came across another potential requirement from the WP Zero team:
add the x-analytics['zero'] to the dimensions.  This would allow the zero
team to get pageviews per partner carrier.  Our partners are interested in
this data, however, they don't want to share it with anyone as it is
competitive data, and we can't make it public.


On Fri, Jun 5, 2015 at 10:51 AM, Jon Katz  wrote:

> Thanks Dan, and apologies if these are naive questions:
>
> For mobile web can we also see beta v. stable?  This is important for
> tracking prototypes, which is one of the core product uses for this data.
>
> For apps can we see ios v android?
>
>
>
> On Fri, Jun 5, 2015 at 8:39 AM, Oliver Keyes  wrote:
>
>> On 5 June 2015 at 10:38, Dan Andreescu  wrote:
>> >> Gotcha. Reading that proposal it appears to be a proposal for a
>> >> methodology that will enable future proposals; where are the future
>> >> proposals?
>> >
>> >
>> > Well, so the geo cube has to guess a bit at who would find it useful in
>> the
>> > future.
>> >
>> >>
>> >> It also says "in many countries, disease monitoring must be
>> >> carried out at the state or metro-area level" - which countries have
>> >> to be metro-level? Who are we risking the entire reader population
>> >> for, here? Is it one country, or ten, or?
>> >>
>> >> For what it's worth I love the idea of this kind of live stream. But I
>> >> want to make sure that how the various chunks are being prioritised,
>> >> and how critical they are to the outside world, is correlated - and is
>> >> correlated with the underlying data's sensitivity, at that. If we're
>> >> introducing risks by going down to city level and the actual use cases
>> >> for city level data are limited, let's not do that - but this proposal
>> >> doesn't provide thoughts on how limited those use cases are. It just
>> >> says that it's required in some countries.
>> >
>> >
>> > I agree with you, but I'm not sure the data is risky if it's
>> k-anonymous.
>> > Most likely, just doing that will limit the countries for which metro
>> level
>> > data is available.
>>
>> I don't think it is if it is! As you said, though, we need to hammer
>> on it for a while to make absolutely sure it's okay, and using
>> lower-resolution data would not only make this easier but also reduce
>> the cost of getting people wrong (geolocating people to MA is less
>> dangerous than geolocating them to Arlington)
>>
>> >
>> > ___
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Wikimedia-search-private] Search dashboards are now running on live data

2015-05-27 Thread Kevin Leduc
OK, who on the search team needs to be there?

On Wed, May 27, 2015 at 5:33 PM, Oliver Keyes  wrote:

> Indeed, but next step != explicit deliverables. Kevin, could you put
> together a meeting to work out what's specifically being asked for
> here? Then Dan can prioritise and schedule it as part of the standard
> process.
>
> On 27 May 2015 at 17:04, Tomasz Finc  wrote:
> > On Tue, May 26, 2015 at 4:37 PM, Oliver Keyes 
> wrote:
> >> I'm sort of shocked to hear "we're supposed to be presenting this data
> >> at the next metrics meeting": in the future if there are instances
> >> where data is going to be up for public scrutiny, would it be possible
> >> to explicitly associate time for that? My goal is to get us to the
> >> point where our data is reliable all, or at least, most of the time,
> >> and for a fragment of one person's time over two weeks, I think
> >> progress on that is pretty fantastic. But prepping data for that kind
> >> of event does change the priorities and what tasks should be worked
> >> on.
> >>
> >> If we want to present data, generally speaking, let's discuss what we
> >> can show off. If we want to present the dashboards I'll put my all
> >> into making the data at least something where we know the
> >> deficiencies, if not something where we consider the deficiencies
> >> tolerable.
> >
> > This was brought up as a next step during a number of discussions
> > between the team before I left. Let's focus on what it would take and
> > work with the team to do it. I don't want to present data we have no
> > confidence in but we need to start showcasing the stories that were
> > learning and pushing search forward for our users.
> >
> > Thank you for your work on this Oliver
> >
> > --tomasz
> >
> > ___
> > Wikimedia-search-private mailing list
> > wikimedia-search-priv...@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-search-private
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] stats.grok.se not updating

2015-05-27 Thread Kevin Leduc
The Wikimedia dumps are up to date (
http://dumps.wikimedia.org/other/pagecounts-raw/ )

It appears stats.grok.se has not been getting any operational love lately
and Henrik has been silent.

On Tue, May 26, 2015 at 10:12 PM, Vipul Naik  wrote:

> I just noticed that stats.grok.se doesn't have any data beyond Saturday
> May 23. Wondering if Henrik or others know what the issue is (are the
> Wikimedia dumps not up-to-date, or has stats.grok.se not been running the
> updating scripts?)
>
> Vipul
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] clicks on red links

2015-05-22 Thread Kevin Leduc
We do not have such statistics.

I wonder if it would be possible to set up an EventLogging schema to log
hits to redlinks and what happens after.

On Wed, May 20, 2015 at 10:37 PM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> Hi,
>
> Are there statistics about the number of people who click on red links in
> Wikimedia projects?
>
> And about what they do as the next step - go back, close the page, create
> an article, something else?
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] "Maybe Analytics" project in Phabricator

2015-04-27 Thread Kevin Leduc
+1 to Dan

On Monday, April 27, 2015, Dan Andreescu  wrote:
> Sounds to me like the nuance we were trying to go for is causing
confusion.  This is unintended and my opinion is that we should remove
maybe-analytics and just tell everyone to use blocked-on-analytics as
liberally as they wish.
> On Mon, Apr 27, 2015 at 1:45 AM, Andre Klapper 
wrote:
>>
>> On Fri, 2015-04-17 at 18:15 -0700, Grace Gellerman wrote:
>> > The project is intended for Analytics customers to alert Analytics of
>> > work in their products that they think might intersect with ours. It's
>> > a way of giving Analytics an early heads-up so that Analytics can
>> > either say,"Thanks for the early warning!" or "Thanks, but this does
>> > not touch Analytics."
>> >
>> >
>> > We can remind participants at Scrum-of-Scrums that they can use this
>> > project.
>>
>> Isn't that pretty much what
>> https://phabricator.wikimedia.org/tag/blocked-on-analytics/ is for?
>> Both projects should receive urgent triage anyway (and hence a decision
>> whether a task is actually Analytics territory or not), but I see zero
>> folks listed under "Watchers" [1] on either project pages?
>>
>> > So for now, please do not archive it.  Thanks!
>>
>> I would like to archive that project soon, given my comment above.
>> Furthermore, that project has been entirely unused (maybe because nobody
>> has ever heard of that project...).
>>
>> If I imagined every project to have a corresponding maybe-project, we'd
>> just create unneeded abstraction layers.
>> Newly created tasks should receive triage. One triage steps is defining
>> if the task is associated to the right project(s). No "maybe" needed.
>>
>> Cheers,
>> andre
>>
>> [1]
https://www.mediawiki.org/wiki/Phabricator/Help#Receiving_updates_and_notifications
>> --
>> Andre Klapper | Wikimedia Bugwrangler
>> http://blogs.gnome.org/aklapper/
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Task for your attention - update to app uniques and session reports

2015-04-22 Thread Kevin Leduc
I'll bring this up for review at our tasking meeting Thursday morning and
get an estimate of points to complete the task.

On Wed, Apr 22, 2015 at 3:44 PM, Nuria Ruiz  wrote:

> Please cc analytics@ so the whole team sees this requests.
>
> On Wed, Apr 22, 2015 at 3:09 PM, Dan Garry  wrote:
>
>> Hey Kevin,
>>
>> Task for your attention: T96926
>> 
>>
>> The following patches are ready to be merged in the iOS and Android apps
>> when that task is resolved:
>>
>>- https://gerrit.wikimedia.org/r/#/c/205980/
>>- https://gerrit.wikimedia.org/r/#/c/205976/
>>
>> Let me know if you've got any questions.
>>
>> Thanks,
>> Dan
>>
>> --
>> Dan Garry
>> Product Manager, Search and Discovery
>> Wikimedia Foundation
>>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] "Maybe Analytics" project in Phabricator

2015-04-17 Thread Kevin Leduc
I am not opposed to archiving it... but would like to hear from our
agile-coach Grace who created this project.  She's offsite at a training
and will be back in the office next week.

On Fri, Apr 17, 2015 at 2:48 PM, Andre Klapper 
wrote:

> Today somebody on IRC pointed out the existence of
> https://phabricator.wikimedia.org/tag/maybe_analytics/
> which seems to be entirely unused (created in Feb 2015).
>
> Its description implies that its intended use is more or less the same
> as the #Blocked-on-Analytics project (created in Dec 2014).
>
> So can this project be archived?
> If not, how do you plan to actually use it?
>
> Generally speaking: I'm not aware of a task where the creation of this
> project was proposed / discussed. For future reference, please respect
> https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects
>
> andre
> --
> Andre Klapper | Wikimedia Bugwrangler
> http://blogs.gnome.org/aklapper/
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] New fields in wmf.webrequest hive table

2015-04-13 Thread Kevin Leduc
Look at the record_version field to know if the new column is populated.
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_known_problems_since_2015-03-04

On Sun, Apr 12, 2015 at 10:43 AM, Toby Negrin  wrote:

> Hi Yuri --
>
> In general, I do not think this table will change a lot moving forward.
> We're migrating to a more complete definition right now so some changes are
> to be expected but things should settle down.
>
> Thanks for the new fields!
>
> -Toby
>
> On Sun, Apr 12, 2015 at 9:55 AM, Andrew Otto  wrote:
>
>> You probably have to do it conditionally by date
>>
>>
>> On Apr 12, 2015, at 12:38, Yuri Astrakhan 
>> wrote:
>>
>> Thanks Oliver! Is there a way to handle it in hql? E.g if(
>> exists(is_pageview),is_pageview,null)?  Finding out if field exists by
>> observing query crash seems wrong ))
>> On Apr 12, 2015 06:53, "Oliver Keyes"  wrote:
>>
>>> (Duplicated from bug):
>>>
>>> That's not a bug. The complexity of regenerating ~60 days of data,
>>> where a day is 24*60*125000 rows, is extreme, and adding new fields
>>> means doing just that - regenerating the entire thing. As such, the
>>> decision was made to add to the field definition and only add actual
>>> values going forward from the point at which the patch was merged.
>>> This was true of the is_pageview calculation, the user agent data and
>>> the geolocation elements previously added, and is still true now.
>>>
>>> On 11 April 2015 at 03:33, Yuri Astrakhan 
>>> wrote:
>>> > I tried to move Zero analytics to the new table, and decided to test
>>> the new
>>> > wonderful fields like agent_type ... and it only works on the most
>>> recent
>>> > hours of data ((
>>> >
>>> > https://phabricator.wikimedia.org/T95806
>>> >
>>> >
>>> > On Fri, Apr 10, 2015 at 8:51 PM, Yuri Astrakhan <
>>> yastrak...@wikimedia.org>
>>> > wrote:
>>> >>
>>> >> Please clarify why the field "is_zero" is needed, as it is nothing
>>> more
>>> >> than a test for ("zero=" in x_analytics). Does having this field
>>> >> significantly improve performance for zero queries, e.g. "select
>>> count(*)
>>> >> from requests where iszero = true" ? Because otherwise it simply
>>> identifies
>>> >> "zero partner" traffic, not "was that request actually zero rated or
>>> not".
>>> >>
>>> >> Thanks!
>>> >>
>>> >> On Fri, Apr 10, 2015 at 5:16 PM, Oliver Keyes 
>>> >> wrote:
>>> >>>
>>> >>> Cool!
>>> >>>
>>> >>> On 10 April 2015 at 17:12, Joseph Allemandou <
>>> jalleman...@wikimedia.org>
>>> >>> wrote:
>>> >>> > Yes Oliver, the agent_type = spider includes IsCrawler UDF.
>>> >>> >
>>> >>> > On Fri, Apr 10, 2015 at 11:08 PM, Oliver Keyes <
>>> oke...@wikimedia.org>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> What does agent-type add? In the sense that if we're pre-parsing
>>> the
>>> >>> >> user agent, surely the difference is between "WHERE agent_type !=
>>> >>> >> 'spider'" and "WHERE user_agent_map['device_family'] != 'Spider'"?
>>> >>> >> Does agent_type include the isCrawler UDF results?
>>> >>> >>
>>> >>> >> On 10 April 2015 at 16:47, Joseph Allemandou
>>> >>> >> 
>>> >>> >> wrote:
>>> >>> >> > And I forgot one field :
>>> >>> >> >
>>> >>> >> > is_zero - True if a request is made on a zero provider.
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On Fri, Apr 10, 2015 at 10:36 PM, Leila Zia <
>>> le...@wikimedia.org>
>>> >>> >> > wrote:
>>> >>> >> >>
>>> >>> >> >> Hi Joseph,
>>> >>> >> >>
>>> >>> >> >>Thanks for the update, and for doing this. These three items
>>> >>> >> >> make
>>> >>> >> >> the
>>> >>> >> >> analysis of the data much easier on our end. We've had many
>>> >>> >> >> requests in
>>> >>> >> >> the
>>> >>> >> >> past that required agent_type and access_method information and
>>> >>> >> >> having
>>> >>> >> >> them
>>> >>> >> >> readily available is awesome! :-)
>>> >>> >> >>
>>> >>> >> >> Have a great weekend!
>>> >>> >> >>
>>> >>> >> >> Leila
>>> >>> >> >>
>>> >>> >> >> On Fri, Apr 10, 2015 at 1:21 PM, Joseph Allemandou
>>> >>> >> >>  wrote:
>>> >>> >> >>>
>>> >>> >> >>> Hi Analytics people,
>>> >>> >> >>>
>>> >>> >> >>> Today happens another bunch of addition to the refined
>>> webrequest
>>> >>> >> >>> table
>>> >>> >> >>> in hive.
>>> >>> >> >>> Now the table contains:
>>> >>> >> >>>
>>> >>> >> >>> ts - The unix timestamp (milliseconds) version of the dt date
>>> >>> >> >>> access_method - The method used to access the site, being one
>>> of
>>> >>> >> >>> the
>>> >>> >> >>> three [mobile app | mobile web | desktop]
>>> >>> >> >>> agent_type - To differentiate easily between spiders and users
>>> >>> >> >>> (more
>>> >>> >> >>> values may be added later).
>>> >>> >> >>>
>>> >>> >> >>> These additions are based on the "tags", as defined here:
>>> >>> >> >>> https://meta.wikimedia.org/wiki/Research:Page_view
>>> >>> >> >>>
>>> >>> >> >>> Have a good weekend !
>>> >>> >> >>>
>>> >>> >> >>> --
>>> >>> >> >>> Joseph Allemandou
>>> >>> >> >>> Data Engineer @ Wikimedia Foundation
>>> >>> >> >>> IRC: joal
>>> >>> >> >>>
>>> >>> >> >>> 

Re: [Analytics] Eventlogging outage

2015-04-08 Thread Kevin Leduc
the data loss and no-backfilling are documented in the incident report
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150406-EventLogging#Actionables

On Wed, Apr 8, 2015 at 10:40 AM, Dan Andreescu 
wrote:

> It did cause data loss, and we can not backfill because the disk was full
> so the logs were not written.
>
> On Wed, Apr 8, 2015 at 1:37 PM, Aaron Halfaker 
> wrote:
>
>> Thanks Nuria.
>>
>> Did this cause data loss and if so, is there a plan to backfill?
>>
>> -Aaron
>>
>> On Wed, Apr 8, 2015 at 12:28 PM, Nuria Ruiz  wrote:
>>
>>> Team:
>>>
>>> As you might know we have swapped EL old vanadium box to a a never, more
>>> resilient one.
>>>
>>> This new box had less disk space and the move caused a small outage due
>>> to a bug already present on EL code that was not apparent on vanadium.
>>>
>>> Details can be found here:
>>>
>>>
>>> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150406-EventLogging
>>>
>>> Thanks,
>>>
>>> Nuria
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Parsoid Performance Metrics

2015-04-02 Thread Kevin Leduc
Thanks Christy,

I have added these dashboards to our list of dashboards [1].  Is the
parsoid team also aware of these dashboards?  I wouldn't assume they are
paying close attention to the Analytics mailing list.


[1] https://meta.wikimedia.org/wiki/Research:Data/Dashboards

On Tue, Mar 31, 2015 at 11:02 AM, E.C Okpo  wrote:

> Hello,
>
> Parsoid now has dashboards that track performance metrics for both the
> html to wikitext (1) and wikitext to html (2) routes. Performance
> instrumentation was achieved with StatsD, Graphite and Grafana.
>
> I also compiled a guide (3) to this process for future reference, though
> your mileage might vary.
>
> These materials were created as part of my FOSS-OPW Internship with the
> Parsoid team, which ends today :(. It's been such a blast working with the
> Parsoid team, meeting members of the community and getting a taste of
> working on Open Source Software.
>
> Regards,
> Christy Okpo
>
> (1) http://grafana.wikimedia.org/#/dashboard/db/parsoid-timing-html2wt
> (2) http://grafana.wikimedia.org/#/dashboard/db/parsoid-timing-wt2html
> (3)
> https://www.mediawiki.org/w/index.php?title=Parsoid/Adding_instrumentation_how-to
>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Technical][Request for Comment] A new format for the pageview dumps

2015-03-16 Thread Kevin Leduc
I'm curious to why you are dropping the byte-count.  I'm not opposed to it,
just wondering if that data is not valuable.



On Fri, Mar 13, 2015 at 12:06 PM, Oliver Keyes  wrote:

> So, we've got a new pageviews definition; it's nicely integrated and
> spitting out TRUE/FALSE values on each row with the best of em. But
> what does that mean for third-party researchers?
>
> Well...not much, at the moment, because the data isn't being released
> somewhere. But one resource we do have that third-parties use a heck
> of a lot, is the per-page pageviews dumps on dumps.wikimedia.org.
>
> Due to historical size constrains and decision-making (and by
> historical I mean: last decade) these have a number of weirdnesses in
> formatting terms; project identification is done using a notation
> style not really used anywhere else, mobile/zero/desktop appear on
> different lines, and the files are space-separated. I'd like to put
> some volunteer time into spitting out dumps in an easier-to-work-with
> format, using the new definition, to run in /parallel/ with the
> existing logs.
>
> *The new format*
> At the moment we have the format:
>
> project_notation - encoded_title - pageviews - bytes
>
> This puts zero and mobile requests to pageX in a different place to
> desktop requests, requires some reconstruction of project_notation,
> and contains (for some use cases) extraneous information - that being
> the byte-count. The files are also headerless, unquoted and
> space-separated, which saves space but is sometimes...I think the term
> is "h-inducing".
>
> What I'd like to use as a new format is:
>
> full_project_url - encoded_title - desktop_pageviews -
> mobile_and_zero_pageviews
>
> This file would:
>
> 1. Include a header row;
> 2. Be formatted as a tab-separated, rather than space-separated, file;
> 3. Exclude bytecounts;
> 4. Include desktop and mobile pageview counts on the same line;
> 5. Use the full project URL ("en.wikivoyage.org") instead of the
> pagecounts-specific notation ("en.v")
>
> So, as a made-up example, instead of:
>
> de.m.v Florence 32 9024
> de.v Florence 920 7570
>
> we'd end up with:
>
> de.wikivoyage.org Florence 920 32
>
> In the future we could also work to /normalise/ the title - replacing
> it with the page title that refers to the actual pageID. This won't
> impact legacy files, and is currently blocked on the Apps team, but
> should be viable as soon as that blocker goes away.
>
> I've written a script capable of parsing and reformatting the legacy
> files, so we should be able to backfill in this new format too, if
> that's wanted (see below).
>
> *The size constraints*
>
> There really aren't any. Like I said, the historical rationale for a
> lot of these decisions seems to have been keeping the files small. But
> by putting requests to the same title from different site versions on
> the same line, and dropping byte-count, we save enough space that the
> resulting files are approximately the same size as the old ones - or
> in many cases, actually smaller.
>
> *What I'm asking for*
>
> Feedback! What do people think of the new format? What would they like
> to see that they don't? What don't they need, here? How useful would
> normalisation be? How useful would backfilling be?
>
> *What I'm not asking for*
> WMF time! Like I said, this is a spare-time project; I've also got
> volunteers for Code Review and checking, too (Yuvi and Otto).
>
> The replacement of the old files! Too many people depend on that
> format and that definition, and I don't want to make them sad.
>
> Thoughts?
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] [Data][Outage] Statistics per wikipedia for 2015

2015-02-27 Thread Kevin Leduc
Hi Erik Z,

A member from the Czech community requested an update on when stats for
2015 per wiki [0] will be available.  The email was sent to the wikimetrics
list [1] so I am relaying it here.

[0] http://stats.wikimedia.org/CS/TablesWikipediaCS.htm
[1]
https://lists.wikimedia.org/pipermail/wikimetrics/2015-February/000258.html
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Application to work in field of Research and Data Analytics at Wikimedia Engineering

2015-02-25 Thread Kevin Leduc
Hi Vikram,

As a volunteer, you may want to look at this page for some tasks the
analytics team identified as simple enough for a volunteer to take on:
https://phabricator.wikimedia.org/tag/analytics-volunteering/

The best way to start a discussion about one of these tasks is on the
#wikimedia-analytics IRC channel.

On Wed, Feb 25, 2015 at 12:40 AM, Quim Gil  wrote:

> Hi,
>
> On Tue, Feb 24, 2015 at 11:18 PM, Pine W  wrote:
>
>> I am including Quim Gil in this email reply. He may be able to match your
>> interests with available opportunities.
>>
> Available opportunities for jobs at the Wikimedia Foundation can be found
> at https://wikimediafoundation.org/wiki/Work_with_us
>
> Identified opportunities to contribute to Wikimedia Analytics as a
> volunteer (which is a possible path to become a strong job candidate) can
> be found in https://www.mediawiki.org/wiki/Annoying_little_bugs#Analytics
>
> --
> Quim Gil
> Engineering Community Manager @ Wikimedia Foundation
> http://www.mediawiki.org/wiki/User:Qgil
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Provenance Params

2015-02-23 Thread Kevin Leduc
Personally, I would rather see the parameter named something other than
"analytics".  It's too generic.  I would suggest using "source",
"provenance" or even "share_a_fact"


On Mon, Feb 23, 2015 at 5:10 PM, Kevin Leduc  wrote:

> Oliver, the discussion is on the formatting of the URL that is posted on
> user's twitter or facebook feed when they use the "share a fact" feature.
> We can't set headers at this point because users are clicking on the like
> from another site.
>
>
>
>
>
> On Mon, Feb 23, 2015 at 4:50 PM, Oliver Keyes 
> wrote:
>
>> Why not just throw something into x_analytics and aggregate by that value?
>>
>> On 23 February 2015 at 19:41, Adam Baso  wrote:
>> > Hi all -
>> >
>> > I'm checking with people in ops, but we're planning to add a well
>> defined
>> > parameter to the end of URLs to see the level of clickthroughs on such
>> > links. For example:
>> >
>> > https://en.wikipedia.org/wiki/Epirus?analytics=ios_share_a_fact_v1
>> >
>> > (If there are existing params on the URL - not an issue so far that I
>> know
>> > of for the apps as they canonicalize the title and URL - then the param
>> > would be last in the ampersand separated query string parameter.)
>> >
>> > And then we'd use Varnish to remove the parameter to reduce the risk of
>> > cache fragmentation.
>> >
>> > We "know" this is probably only a short term solution, and as a follow
>> up
>> > from the meeting with the people on the CC line, I'm emailing to open
>> the
>> > discussion on options for a more generic option.
>> >
>> > So far I think there are a few options from what we've discussed, if
>> we're
>> > to support additional bucketing.
>> >
>> > (1) More parameters (e.g., ?analytics=ios_share_a_fact&version=1)
>> > Downside: potentially harder to standardize and remove things from the
>> URL
>> >
>> > (2) More conventional provenance (e.g.,
>> >
>> https://en.wikipedia.org/w/index.php?title=Castle&oldid=645632619/ref=_wref_source%3Dapp
>> <...more
>> > provenance info as desired>/).
>> > Downside: technically speaking, may break the schema of well-formed
>> titles
>> >
>> > (3) Rely upon (1) or (2), or perhaps an even more RESTful shortlinker
>> (it
>> > could have features like target - web or w:// wor wiki:// protocol or
>> > whatever - versioning, etc.).
>> > Downside: maybe a little more work to stand up service. As we recalled,
>> > there's an extension out there that may, perhaps with some tweaks, fit
>> the
>> > build.
>> >
>> >
>> >
>> >
>> > -Adam
>> >
>> >
>> > ___
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Provenance Params

2015-02-23 Thread Kevin Leduc
Oliver, the discussion is on the formatting of the URL that is posted on
user's twitter or facebook feed when they use the "share a fact" feature.
We can't set headers at this point because users are clicking on the like
from another site.





On Mon, Feb 23, 2015 at 4:50 PM, Oliver Keyes  wrote:

> Why not just throw something into x_analytics and aggregate by that value?
>
> On 23 February 2015 at 19:41, Adam Baso  wrote:
> > Hi all -
> >
> > I'm checking with people in ops, but we're planning to add a well defined
> > parameter to the end of URLs to see the level of clickthroughs on such
> > links. For example:
> >
> > https://en.wikipedia.org/wiki/Epirus?analytics=ios_share_a_fact_v1
> >
> > (If there are existing params on the URL - not an issue so far that I
> know
> > of for the apps as they canonicalize the title and URL - then the param
> > would be last in the ampersand separated query string parameter.)
> >
> > And then we'd use Varnish to remove the parameter to reduce the risk of
> > cache fragmentation.
> >
> > We "know" this is probably only a short term solution, and as a follow up
> > from the meeting with the people on the CC line, I'm emailing to open the
> > discussion on options for a more generic option.
> >
> > So far I think there are a few options from what we've discussed, if
> we're
> > to support additional bucketing.
> >
> > (1) More parameters (e.g., ?analytics=ios_share_a_fact&version=1)
> > Downside: potentially harder to standardize and remove things from the
> URL
> >
> > (2) More conventional provenance (e.g.,
> >
> https://en.wikipedia.org/w/index.php?title=Castle&oldid=645632619/ref=_wref_source%3Dapp
> <...more
> > provenance info as desired>/).
> > Downside: technically speaking, may break the schema of well-formed
> titles
> >
> > (3) Rely upon (1) or (2), or perhaps an even more RESTful shortlinker (it
> > could have features like target - web or w:// wor wiki:// protocol or
> > whatever - versioning, etc.).
> > Downside: maybe a little more work to stand up service. As we recalled,
> > there's an extension out there that may, perhaps with some tweaks, fit
> the
> > build.
> >
> >
> >
> >
> > -Adam
> >
> >
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Welcome Joseph

2015-02-19 Thread Kevin Leduc
Welcome Joseph!

On Wed, Feb 18, 2015 at 9:40 PM, Leila Zia  wrote:

> Welcome to the team, Joseph!
>
> b.t.w., I didn't know you have a background in NLP. That skill may become
> handy soon. ;-)
>
> On Wed, Feb 18, 2015 at 6:37 PM, Toby Negrin 
> wrote:
>
>> Hi Everyone,
>>
>> I'd like to welcome Joseph Allemendou to the Analytics team! We are
>> really excited to get some of Joseph's calibre to help take our analytics
>> work to the next level.
>>
>> In his own words:
>>
>> Joseph's experiences were mostly with private companies and almost
>> always involved open source software. After a M.S. in Computer Science
>> with a specialization in programming languages theory and a PhD in the
>> Natural Language Processing and Dialog Systems fields, Joseph worked
>> four years in Ireland. He spent two years at IBM learning and applying
>> project management and process improvement methodologies, and two other
>> years building a start-up to help English as a foreign language teachers
>> find up-to-date teaching material. Then he moved back to France and worked
>> for Criteo as a specialist in scalabilty for one year, and as a manager for
>> another year. Lastly Joseph worked with Fotolia, where he built the
>> analytics architecture and team. Working with the Wikimedia Foundation
>> allows him to really apply his energy and skills in the direction he wish
>> the world to move on.
>>
>> Joseph is based in Brittany, France. Welcome Joseph!
>>
>> -Toby
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Technical] capacity planning for WikiGrok test 4

2015-02-18 Thread Kevin Leduc
thanks for the heads up.  I went ahead and resolved the associated task.

On Wed, Feb 18, 2015 at 9:14 AM, Nuria Ruiz  wrote:

> Note that there is not much work to do in this regard. We already talked
> with Kaldari in early January about levels of sustainable throughput, and
> our capacity hasn't changed.
>
> It is good to get  a heads up (thanks!) but I think that is all that is
> needed.
>
>
>
>
> On Wed, Feb 18, 2015 at 7:38 AM, Grace Gellerman  > wrote:
>
>> Thanks for the early warning, Leila!
>>
>> I created:
>>
>> https://phabricator.wikimedia.org/T89827
>>
>> to track this work.
>>
>> On Tue, Feb 17, 2015 at 5:28 PM, Leila Zia  wrote:
>>
>>> Hi,
>>>
>>>The Mobile team will be running WikiGrok experiments in the first
>>> half of March 2015. Dario and I will be working closely with the team and
>>> will coordinate with Analytics-devs to make sure EventLogging can handle
>>> the throughput. The expected throughput is what EL experienced through the
>>> last WikiGrok experiment in early January. This email is a heads up since
>>> EL has limited capacity, other teams may want to run experiments, and we
>>> need to plan for experiments in advance.
>>>
>>> Best,
>>> Leila
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] stats.grok.se not updating

2015-02-12 Thread Kevin Leduc
Thanks Henrik!

On Thu, Feb 12, 2015 at 9:32 AM, Henrik Abelsson 
wrote:

>  Hi Kevin,
>
> I'm rerunning the missing days, but the download from dumps.wikimedia.org
> is much slower than normal, I get on the order of 30-50 KB/sec on a 100mbit
> connection. Anyway, as soon as the days are downloaded and processed they
> should be back up on stats.grok.se again.
>
> -henrik
>
> On 11/02/15 22:57, Kevin Leduc wrote:
>
> Thank you!  Would you mind posting a note on Analytics@lists.wikimedia.org
> when it is working normally again?
>
> On Wed, Feb 11, 2015 at 1:36 PM, Henrik Abelsson 
> wrote:
>
>>  Hi Kevin,
>>
>> Looking into it!
>>
>> -henrik
>>
>>
>> On 11/02/15 16:36, Kevin Leduc wrote:
>>
>> Hi Henrik,
>>
>>  stats.grok.se has missing data in the last week.  Can you restart the
>> service to see if that helps?
>>
>>  Thanks!
>> Kevin Leduc
>> Analytics Product Manager
>>
>>
>>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] stats.grok.se not updating

2015-02-11 Thread Kevin Leduc
Thank you!  Would you mind posting a note on Analytics@lists.wikimedia.org
when it is working normally again?

On Wed, Feb 11, 2015 at 1:36 PM, Henrik Abelsson 
wrote:

>  Hi Kevin,
>
> Looking into it!
>
> -henrik
>
>
> On 11/02/15 16:36, Kevin Leduc wrote:
>
> Hi Henrik,
>
>  stats.grok.se has missing data in the last week.  Can you restart the
> service to see if that helps?
>
>  Thanks!
> Kevin Leduc
> Analytics Product Manager
>
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] No information on http://stats.grok.se/

2015-02-11 Thread Kevin Leduc
Yes, I notified the volunteer who maintains that system and he is looking
into it now.

On Wed, Feb 11, 2015 at 1:04 PM, Jonathan Morgan 
wrote:

> Hi Anthony,
>
> Thanks for the ping. We're aware of the problem. Analytics is following up
> with the (volunteer) maintainer of stats.grok.se now. Hopefully this will
> be resolved soon, and we'll update this list. Cheers, J
>
> On Tue, Feb 10, 2015 at 7:17 PM, Anthony Oertel 
> wrote:
>
>> I have not seen updated statistics on Wikipedia article traffic
>> statistics  .
>>
>>
>>
>>
>>
>>
>> Wikipedia article traffic statistics 
>> Wikipedia article traffic statistics What do Wikipedia's readers care
>> about? Is Britney Spears more popular than Brittany? Is Asia Carrera more
>> popular than Asia?
>> View on stats.grok.se 
>> Preview by Yahoo
>>
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Jonathan T. Morgan
> Community Research Lead
> Wikimedia Foundation
> User:Jmorgan (WMF) 
> jmor...@wikimedia.org
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] s1-analytics-slave

2015-02-09 Thread Kevin Leduc
I know that Erik Moller still uses geowiki a lot got look at the state of
the wikis.  I'm not ready to prioritize a transition of the geowiki code to
use wikimetrics (that's a whole project in itself).

Christian, can we just point the geowiki code to a different database?

On Thu, Feb 5, 2015 at 6:28 PM, Sean Pringle  wrote:

> On Fri, Feb 6, 2015 at 12:45 AM, Aaron Halfaker 
> wrote:
>
>> I've been slow to move some datasets off of s1-analytics-slave because it
>> remained available.  If I were given ~ a week notice, it would be no
>> problem to move all datasets and work to analytics-store.
>>
>> Am I reading correctly that you are suggesting that we might have *both* 
>> dbstore1002
>> and dbstore2002 available?  Now, as far as having two machine for querying,
>> this would be valuable for spreading the load of regular jobs (e.g.
>> dashboard scripts) from ad-hoc queries.
>>
>
> Correct, analytics would have access to both boxes, ideally with some
> logical production/ad-hoc traffic split.
>
> Important to note that dbstore2002 would not replicate dbstore1002 staging
> or datasets databases, just the upstream wikis and eventlogging. If you
> want staging on both you'd have to maintain it twice.
>
> Sean
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Office Hours for EventLogging & Dashboarding

2015-01-14 Thread Kevin Leduc
Reminder:

Analytics Office Hours starts in 1 hour.  Bring your questions, issues,
thoughts on EventLogging and Limn Dashboards directly to the team
responsible for it.

To start a conversation with us, join the IRC channel  #wikimedia-analytics
and say Hello.  We’ll be standing by to talk and answer any questions.  We
can invite you to our google hangout if needed once the conversation
started on IRC.

For those of you in the San Francisco office, we have reserved R37 Chambers
if you want to join me there.


On Mon, Dec 22, 2014 at 12:51 PM, Kevin Leduc  wrote:

> Please join the Analytics Engineering team for...
>
> Office Hours: EventLogging & Dashboarding
>
> Hosts: Dan and Nuria
>
> Date: January 14
>
> Time: 20:00 UTC - Convert to Local Time
> <http://www.timeanddate.com/worldclock/fixedtime.html?msg=EventLogging+and+Dashboarding+Office+Hours&iso=20150114T20&p1=%3A&ah=1>
>
> Hangout: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave
>
> IRC: #wikimedia-analytics
>
> Description:
> Teams need metrics on how their product or feature is performing, then
> they need to visualize those metrics.  This is accomplished with
> instrumenting code with EventLogging, mashing data with some queries and
> setting up a Limn Dashboard.  The Analytics Engineering team is open for
> office hours to answer questions about the process, help solve any issues
> and listen to feedback on the process.  Feel free to drop in the Goolge
> Hangout linked above or ask questions on the IRC channel during our Office
> Hours.
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Beta Labs EventLogging logs

2015-01-13 Thread Kevin Leduc
Hey Ryan,

I want to make sure we address your needs as best we can so I am following
up on this.

Access permisions
What's the machine and the log you're trying to access (please be specific,
I am not a developer so assume I know very little).  I'll pass this on to
Ops so they can have a look at why permissions changed.  This shouldn't
happen.

Piping Events that fail validation.
It's in our backlog, it's relatively high priority, but not high enough for
us to have tasked it out yet or committed it to a sprint.  That'll happen
in February.




On Fri, Jan 9, 2015 at 5:49 PM, Ryan Kaldari  wrote:

> Looks like I've lost permission to view those logs on Beta Labs again. Any
> chance you could fix them? Also, was any progress ever made on piping the
> live cluster errors into a dedicated log with easy access? I know it sounds
> like it wouldn't be that useful, but we have actually had cases where
> server-side EventLogging was failing on en.wiki, but working on Beta Labs
> and locally. It would also be useful for catching obscure failures that
> only happen for edge cases.
>
> Kaldari
>
> On Wed, Jan 7, 2015 at 1:27 PM, Ryan Kaldari 
> wrote:
>
>> Ah, sorry, I was looking on the wrong server (deployment-bastion). Thanks!
>>
>> On Wed, Jan 7, 2015 at 1:21 PM, Nuria Ruiz  wrote:
>>
>>> Ahem they are there:
>>>
>>> nuria@deployment-eventlogging02:/var/log/upstart$ ls eventlogging_*log
>>> eventlogging_processor-client-side-events.log
>>>  eventlogging_processor-server-side-events.log
>>>
>>> On Wed, Jan 7, 2015 at 12:57 PM, Ryan Kaldari 
>>> wrote:
>>>
 It seems the EventLogging logs have disappeared from /var/log/upstart/
 on Beta Labs (deployment-bastion). Does anyone know where they are now?

 Kaldari

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Engineering Team Commitments 2014-12-11 -- 2014-12-25

2015-01-09 Thread Kevin Leduc
Following up on the last sprint by the Analytics Engineering team in 2014:

The team met all its commitments and even took on a few more tasks.  There
were no sprints or showcases over the holidays.  Our next showcase will be
Tuesday January 13, 2015, and I will forward the slide deck after the
presentations.

The completed task are here:
https://phabricator.wikimedia.org/sprint/board/935/query/all/

The latest versions of Wikimetrics and Vital Signs were deployed last night.

Wikimetrics: https://metrics.wmflabs.org/

Vital Signs: https://metrics.wmflabs.org/static/public/dash/

Cheers,

Kevin (Analytics Product Manager)


On Thu, Dec 11, 2014 at 3:02 PM, Kevin Leduc  wrote:

> Hello,
>
> It has been a while since the last email of this kind.  The team continued
> it’s bi-weekly sprints around Columbus day, US Thanksgiving and through the
> switch from bugzilla to Phabricator.  We have now re-organized our
> processes around phabricator and are excited to see how this tool will
> display our point burndown during the sprint.
>
>
> Start
>
> 2014-12-11
>
> End
>
> 2014-12-23
>
> Theme
>
> Wikimetrics
>
> Theme song
>
> All Along the Watchtower
> <https://en.wikipedia.org/wiki/All_Along_the_Watchtower>
>
> Point Commitment
>
> 62
>
> # of Tasks
>
> 7
>
> Burndown Chart
>
> https://phabricator.wikimedia.org/sprint/view/935/
>
> Sprint Board
>
> https://phabricator.wikimedia.org/sprint/board/935/query/all/
>
>
>
> Note: emails have not gone out for the last few sprints, but you can see
> the slideshows of their showcases below.
>
> Sprint ending 2014-12-09:
> https://docs.google.com/presentation/d/1LmkWEpcJD0-AtQMRmLEFSM-T_9hNCWVGvCvLLYIrkBQ/edit?usp=sharing
>
> Sprint ending 2014-11-25:
> https://docs.google.com/presentation/d/1siaxV4CVzx-Rqbs9zrEC9lmfhijweaNuRz1Tm_oyNcE/edit?usp=sharing
>
> Sprint ending 2014-11-12
>
>
> https://docs.google.com/presentation/d/1XTy0yLCCKFk-CFKXiiUAZd1uVYL61vwAlOo6ofo3SqA/edit?usp=sharing
>
> Cheers,
>
> Kevin (Analytics Product Manager)
>
> PS
> From now on, the team will vote on a theme song at the sprint planning
> session.  The song must start with the next letter in the alphabet of the
> previous song and will loosely reflect the mood of the team.
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Office Hours for EventLogging & Dashboarding

2015-01-08 Thread Kevin Leduc
We will talk about Limn dashboards only.

On Wed, Jan 7, 2015 at 8:47 PM, Gilles Dubuc  wrote:

> Are we talking about limn dashboards or will this cover other dashboarding
> tools as well?
>
> On Wed, Jan 7, 2015 at 11:11 PM, Kevin Leduc  wrote:
>
>> Reminder, the Analytics Engineering team has office hours Wednesday next
>> week to assist with EventLogging and Dashboards.  If you're at the San
>> Francisco office, you can join us in room R35 Chambers.
>>
>> If you have any questions about the event, let me know.  Thanks!
>>
>>
>> On Mon, Dec 22, 2014 at 12:51 PM, Kevin Leduc 
>> wrote:
>>
>>> Please join the Analytics Engineering team for...
>>>
>>> Office Hours: EventLogging & Dashboarding
>>>
>>> Hosts: Dan and Nuria
>>>
>>> Date: January 14
>>>
>>> Time: 20:00 UTC - Convert to Local Time
>>> <http://www.timeanddate.com/worldclock/fixedtime.html?msg=EventLogging+and+Dashboarding+Office+Hours&iso=20150114T20&p1=%3A&ah=1>
>>>
>>> Hangout: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave
>>>
>>> IRC: #wikimedia-analytics
>>>
>>> Description:
>>> Teams need metrics on how their product or feature is performing, then
>>> they need to visualize those metrics.  This is accomplished with
>>> instrumenting code with EventLogging, mashing data with some queries and
>>> setting up a Limn Dashboard.  The Analytics Engineering team is open for
>>> office hours to answer questions about the process, help solve any issues
>>> and listen to feedback on the process.  Feel free to drop in the Goolge
>>> Hangout linked above or ask questions on the IRC channel during our Office
>>> Hours.
>>>
>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Office Hours for EventLogging & Dashboarding

2015-01-07 Thread Kevin Leduc
Reminder, the Analytics Engineering team has office hours Wednesday next
week to assist with EventLogging and Dashboards.  If you're at the San
Francisco office, you can join us in room R35 Chambers.

If you have any questions about the event, let me know.  Thanks!


On Mon, Dec 22, 2014 at 12:51 PM, Kevin Leduc  wrote:

> Please join the Analytics Engineering team for...
>
> Office Hours: EventLogging & Dashboarding
>
> Hosts: Dan and Nuria
>
> Date: January 14
>
> Time: 20:00 UTC - Convert to Local Time
> <http://www.timeanddate.com/worldclock/fixedtime.html?msg=EventLogging+and+Dashboarding+Office+Hours&iso=20150114T20&p1=%3A&ah=1>
>
> Hangout: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave
>
> IRC: #wikimedia-analytics
>
> Description:
> Teams need metrics on how their product or feature is performing, then
> they need to visualize those metrics.  This is accomplished with
> instrumenting code with EventLogging, mashing data with some queries and
> setting up a Limn Dashboard.  The Analytics Engineering team is open for
> office hours to answer questions about the process, help solve any issues
> and listen to feedback on the process.  Feel free to drop in the Goolge
> Hangout linked above or ask questions on the IRC channel during our Office
> Hours.
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Kevin Leduc
Hey Ryan, I put this bug on our agenda for our tasking meeting so we can
scope it out and decide if we can commit to accomplishing it in the next
sprint.

On Wed, Jan 7, 2015 at 1:46 PM, Nuria Ruiz  wrote:

> Kaldari:
>
> Expanding a bit to what Dan said:
>
> We took up EL from ori's basically 6 months ago. The operational support
> analytics provide is documented here:
> https://www.mediawiki.org/wiki/EventLogging/OperationalSupport
>
> EL has several parts and while we have not done much development on the mw
> extension we have done, together with ori, quite a bit of work on the
> server side of it as otherwise EL could not have scaled to the level its at
> right now:
>
>
> https://github.com/wikimedia/mediawiki-extensions-EventLogging/tree/master/server
>
> By all means poke us about bugs you feel need more attention.
>
> Thanks,
>
> Nuria
>
>
>
>
>
>
>
>
> On Wed, Jan 7, 2015 at 12:23 PM, Dan Andreescu 
> wrote:
>
>> Ryan - I'm sorry I was not aware of this.  The Analytics team is
>> responsible for Event Logging, and you can ping any of us if we're not
>> paying attention to an issue.
>>
>> Christian has been largely taking care of EL by himself, and was kept
>> quite busy with Event Logging reliability and the need to backfill lost
>> data.  As Christian transitions away from our team, the responsibility
>> falls on the rest of us, and I personally am getting up to speed with it.
>> The bug you mentioned, https://phabricator.wikimedia.org/T78325, sounds
>> like a pain and I'm happy to work on it to learn more about EL.  I will
>> bring it up with Kevin and have him respond here if it's *not* a priority.
>>
>> On Wed, Jan 7, 2015 at 3:16 PM, Ryan Kaldari 
>> wrote:
>>
>>> Who is actually maintaining the EventLogging Extension now? As far as I
>>> can tell, none of the members of the Analytics-EventLogging project in
>>> Phabricator are developers. This makes it hard to know who to ping when
>>> there is a problem. For example, this EL bug that I filed a month ago was
>>> never triaged or replied to, and I'm not sure who to poke about it:
>>> https://phabricator.wikimedia.org/T78325
>>>
>>> On Wed, Jan 7, 2015 at 11:32 AM, Dan Andreescu >> > wrote:
>>>
 Folks -- thanks for owning this. One concern -- this is the second
> deployment related problem in the last couple of months. I'm concerned 
> that
> we need to investigate more resources in a testing environment as well as 
> a
> deployment checklist. I'm also considering having EL added to Greg's
> deployment calendar (with the accompanying restrictions) since it's
> approaching being a core service.
>
> Thoughts?
>

 I think all that's needed here is a deployment checklist.  The problem
 was that a change was not tested in beta labs before being deployed.  We
 could automate that and then take it off the checklist at a later time.
 But as long as it happens somehow, I think we should be safe enough.

 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Office Hours for EventLogging & Dashboarding

2014-12-22 Thread Kevin Leduc
Please join the Analytics Engineering team for...

Office Hours: EventLogging & Dashboarding

Hosts: Dan and Nuria

Date: January 14

Time: 20:00 UTC - Convert to Local Time


Hangout: https://plus.google.com/hangouts/_/wikimedia.org/a-batcave

IRC: #wikimedia-analytics

Description:
Teams need metrics on how their product or feature is performing, then they
need to visualize those metrics.  This is accomplished with instrumenting
code with EventLogging, mashing data with some queries and setting up a
Limn Dashboard.  The Analytics Engineering team is open for office hours to
answer questions about the process, help solve any issues and listen to
feedback on the process.  Feel free to drop in the Goolge Hangout linked
above or ask questions on the IRC channel during our Office Hours.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] FYI: "Analytics-Refinery" project renamed in Phabricator

2014-12-19 Thread Kevin Leduc
The new name is *Analytics-Cluster* , the URL is still the same [1]

We changed the name because some of the tasks logged in this project went
beyond the scope of the Refinery code repository and involved actual work
on the cluster.

More on the project is on Wikitech [2]

[1] https://phabricator.wikimedia.org/project/view/655/
[2] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Overview
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] EventLogging data QA

2014-12-15 Thread Kevin Leduc
I reopened the task because discussions on this are still ongoing and the
issue isn't entirely resolved.

I'd like to move this to a video conference call between analytics
developers and analytics engineering to come to a mutual understanding of
what the current pain points are and what's the biggest priority.  We'll
then communicate a plan back to the list and update the tasks involved.



On Mon, Dec 15, 2014 at 4:37 PM, Nuria Ruiz  wrote:
>
> >QA in beta labs is good but not enough. We still need to do QA when a
> feature goes to production and currently
> This is true but at the same time, I do not see anything in the
> description of your FF events that could not be tested on beta-labs. If we
> are talking add-block that can be tested even earlier, vagrant will be a
> fine venue. All the issues related to the client (browser) not emitting
> events can be tested on the development environment with ease.
>
>
>
> On Mon, Dec 15, 2014 at 4:18 PM, Leila Zia  wrote:
>>
>>
>> On Mon, Dec 15, 2014 at 10:06 AM, Toby Negrin 
>> wrote:
>>>
>>> I share Christian's concerns -
>>>
>>> Dario/Leila - can you comment based on your recent experiences with
>>> WikiGrok?
>>>
>>
>> I agree with Christian.
>>
>> QA in beta labs is good but not enough. We still need to do QA when a
>> feature goes to production and currently, it's very hard to figure out if
>> there's a problem with logging. An example:
>>
>> While testing WikiGrok in production, we learned that after some point
>> tests from Firefox browser from my machine were not logged. We did not get
>> any errors for this. I found out about this because I was trying to
>> manually make a trace of activities and see if I can stitch them together
>> and make sense of them. We eventually figured out what was going on in that
>> case [1], but it concerns me that there may be other important events that
>> we don't log in the DB and we never know that we're not logging.
>>
>> Leila
>> [1]
>> https://lists.wikimedia.org/pipermail/analytics/2014-December/002864.html
>>
>>
>>>
>>> Thanks
>>>
>>> -Toby
>>>
>>>
>>> > On Dec 15, 2014, at 9:42 AM, Christian Aistleitner <
>>> christ...@quelltextlich.at> wrote:
>>> >
>>> > Hi,
>>> >
>>> >> On Mon, Dec 15, 2014 at 08:34:39AM -0800, Kevin Leduc wrote:
>>> >> I closed the Phabricator task with a links to this thread and the
>>> wikitech
>>> >> doc for testing on beta cluster.
>>> >
>>> > I am fine with keeping the task closed.
>>> >
>>> > But I am somewhat surprised to see beta mentioned in the
>>> > resolution. Note that Dario's request set scope as [1]
>>> >
>>> >  However, there are types of data quality issues that we only
>>> >  discover when collecting data at scale and in the wild (on
>>> >  browsers/platforms that we don’t necessarily test for internally).
>>> >
>>> > . That's a valid scope, but from my point of view, beta does not match
>>> > that scope.
>>> >
>>> > Neither is beta large scale, nor is it hammered on with crazy devices.
>>> >
>>> > Beta is just a halfing the distance between EventLogging's devserver
>>> > (Vagrant!) and production.
>>> >
>>> > Have fun,
>>> > Christian
>>> >
>>> >
>>> >
>>> > [1]
>>> https://lists.wikimedia.org/pipermail/analytics/2014-December/002884.html
>>> >
>>> >
>>> >
>>> > --
>>> >  quelltextlich e.U.  \\  Christian Aistleitner 
>>> >   Companies' registry: 360296y in Linz
>>> > Christian Aistleitner
>>> > Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
>>> > 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>>> > Fax:+43 7946 / 20 5 81
>>> > Homepage: http://quelltextlich.at/
>>> > ---
>>> > ___
>>> > Analytics mailing list
>>> > Analytics@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>> ___
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] EventLogging workshop at the Wikimedia Developer Summit (WMDS)

2014-12-15 Thread Kevin Leduc
I have updated our team's workshop entry at the WMDS.  The Analytics
Engineering team wants lead an EventLogging workshop.  If you are
interested in attending, please add your name to the list in this section:
https://www.mediawiki.org/wiki/MediaWiki_Developer_Summit_2015#Setting_up_EventLogging_and_a_Dashboard

The more people add their name to the list, the more likely this will
happen!
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] EventLogging data QA

2014-12-15 Thread Kevin Leduc
I closed the Phabricator task with a links to this thread and the wikitech
doc for testing on beta cluster.
https://phabricator.wikimedia.org/T78355


On Mon, Dec 15, 2014 at 7:35 AM, Nuria Ruiz  wrote:
>
> >But I see that meanwhile a Phabricator task got added, and I guess I
> >am alone with my judgement :-)
> Actually, I fully agree with you than no more infrastructure in this
> regard is needed and I think we were a little fast filing tasks here. I
> really think that every time we find ourselves testing in production we
> should evaluate what can do better in the testing pipeline but not augment
> production with more "testing" tools.
>
> For now we should be able to help in irc and do as much testing as
> possible in beta labs. How to access data in beta labs is documented here:
> https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs
>
> I talked to mobile team about testing in beta labs  (as it was an issue
> with mobile instrumentation what sprang this discussion) and they have used
> it as of recent.
>
> Thanks,
>
> Nuria
>
>
>
>
>
>
>
>
> On Mon, Dec 15, 2014 at 6:45 AM, Christian Aistleitner <
> christ...@quelltextlich.at> wrote:
>
>> Hi Dario,
>>
>> On Thu, Dec 11, 2014 at 04:11:49PM -0800, Dario Taraborelli wrote:
>> > I am kicking off this thread [...]
>>
>> Thanks!
>>
>>
>>
>> > However, there are types of data quality issues that we only
>> > discover when collecting data at scale and in the wild (on
>> > browsers/platforms that we don’t necessarily test for internally).
>>
>> Full ACK.
>>
>> However, that sounds like we're only talking about schemas where the
>> collection code got tested using Vagrant or beta, and is known to work
>> on the relevant portion of the traffic.
>>
>> And since you say that it's on browsers/platforms that we don't
>> necessarily test for internally, I assume we're actually talking only
>> about a small fraction of the traffic.
>>
>> I assume that scope for the rest of the reply.
>>
>>
>>
>> > is there a way to inspect invalid events in near real time without
>> > having access to vanadium?
>>
>> * Urgent, ad-hoc needs
>>
>> For urgent, ad-hoc needs, (which should happen really seldom, given
>> the scope), ping us in IRC in #wikimedia-analytics.
>> At least qchris, milimetric, and nuria should be able to ssh into
>> vanadium and can take a look right away.
>>
>> If none of them are around, Ops of course have access to the relevant
>> files on vanadium [1]. And since we're in the case of urgent, ad-hoc
>> needs, I am sure they'd help out.
>>
>>
>> * Not so urgent needs
>>
>> For not so urgent needs, since it's only a small fraction of the
>> traffic, I am not sure real-time need is worth it.
>>
>> Sure it would be nice to provide near real-time access to those files,
>> but we should also get the cluster into a more reliable state,
>> implement UDFs for researches to make their lives easier, and get the
>> data-warehouse up and running ;-)
>>
>>
>>
>> But I see that meanwhile a Phabricator task got added, and I guess I
>> am alone with my judgement :-)
>>
>> Have fun,
>> Christian
>>
>>
>>
>> [1] Either
>>
>>   /srv/log/eventlogging/client-side-events.log
>>
>> or
>>
>>   /srv/log/eventlogging/server-side-events.log
>>
>> depending on the kind of event you're looking for.
>>
>>
>>
>> --
>>  quelltextlich e.U.  \\  Christian Aistleitner 
>>Companies' registry: 360296y in Linz
>> Christian Aistleitner
>> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
>> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>>  Fax:+43 7946 / 20 5 81
>>  Homepage: http://quelltextlich.at/
>> ---
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Engineering Team Commitments 2014-12-11 -- 2014-12-25

2014-12-11 Thread Kevin Leduc
Hello,

It has been a while since the last email of this kind.  The team continued
it’s bi-weekly sprints around Columbus day, US Thanksgiving and through the
switch from bugzilla to Phabricator.  We have now re-organized our
processes around phabricator and are excited to see how this tool will
display our point burndown during the sprint.


Start

2014-12-11

End

2014-12-23

Theme

Wikimetrics

Theme song

All Along the Watchtower


Point Commitment

62

# of Tasks

7

Burndown Chart

https://phabricator.wikimedia.org/sprint/view/935/

Sprint Board

https://phabricator.wikimedia.org/sprint/board/935/query/all/



Note: emails have not gone out for the last few sprints, but you can see
the slideshows of their showcases below.

Sprint ending 2014-12-09:
https://docs.google.com/presentation/d/1LmkWEpcJD0-AtQMRmLEFSM-T_9hNCWVGvCvLLYIrkBQ/edit?usp=sharing

Sprint ending 2014-11-25:
https://docs.google.com/presentation/d/1siaxV4CVzx-Rqbs9zrEC9lmfhijweaNuRz1Tm_oyNcE/edit?usp=sharing

Sprint ending 2014-11-12

https://docs.google.com/presentation/d/1XTy0yLCCKFk-CFKXiiUAZd1uVYL61vwAlOo6ofo3SqA/edit?usp=sharing

Cheers,

Kevin (Analytics Product Manager)

PS
>From now on, the team will vote on a theme song at the sprint planning
session.  The song must start with the next letter in the alphabet of the
previous song and will loosely reflect the mood of the team.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Dev Team Commitments 2014-10-30 -- 2014-11-11

2014-11-19 Thread Kevin Leduc
The sprint [1] ended a day later - Wednesday November 12th because the San
Francisco office was closed for the November 11th holiday.  We also held
our showcase Wednesday and the slides are liked below [2]

The team completed 4 of 6 stories; that’s 18 of 86 points.  Two stories
worth 34 points did not progress as quickly as hoped for due to
dependencies on other teams.  The team has carried over these unfinished
stories into the next sprint.

[1] The Sprint - http://sb.wmflabs.org/t/analytics-developers/2014-10-30/

[2] Showcase Deck -
https://docs.google.com/presentation/d/1XTy0yLCCKFk-CFKXiiUAZd1uVYL61vwAlOo6ofo3SqA/edit?usp=sharing


On Thu, Oct 30, 2014 at 5:46 PM, Kevin Leduc  wrote:

> Hello,
>
> We kicked off our next sprint this morning, with the help of some release
> planning executed during the last 2 weeks.  The sprint status is here:
> http://sb.wmflabs.org/t/analytics-developers/2014-10-30/
>
> The focus of this sprint is working on the backend in preparation to
> display new data in Vital Signs.
>
> Bug ID
>
> Component
>
> Summary
>
> Points
>
> 72740
>
> Dashiki
>
> Story: Vital Signs User selects the Daily Pageviews metrics
>
> 34
>
> 72741
>
> EventLogging
>
> List tables/schemas with data retention needs
>
> 0
>
> 72642
>
> EventLogging
>
> Story: Identify and direct the purging of  Event logging raw logs older
> than 90 days in stat1002
>
> 0
>
> 67450
>
> EventLogging
>
> database consumer could batch inserts (sometimes)
>
> 34
>
> 72746
>
> Wikimetrics
>
> Story: WikimetricsUser tags a cohort using a pre-defined tag
>
> 5
>
> 72635
>
> Wikimetrics
>
> report table performance, cleanup, and number of items
>
> 13
>
> That’s 86 points in 4 stories.
>
> The bugs with 0 points are tasks for the team to track and follow up on,
> and the work mostly falls on other teams.
>
> Regards,
> Kevin Leduc
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] data in Vital Signs

2014-11-04 Thread Kevin Leduc
On October 30, updates were made to how metrics are calculated for the
Vital Signs dashboards [1].  The result is an apparent jump up or down on
some of the metrics starting October 30th.  This is because we did not
update the existing historical data.  We are planning on recalculating all
the historical data after more work is done on the backend over the next
month.

Here are the changes on the metrics:

- Namespace Edits and Pages Created now include pages in all namespaces and
pages that have been deleted.  The plots for these metrics generally show a
step up.  On wikis where most of activity occurs on pages other than in
namespace '0' (like Meta and Commons), you can see a dramatic difference in
the data [2].

- We exclude bots in Rolling Active Editor, Rolling Surviving New Active
Editor and Rolling Recurring Old Active Editor.  The plots for these
metrics show a small step down.  This change is not as conspicuous as the
previous one.

If you were relying on this data right now, and need it consistent across
time, please speak up.


[1] https://metrics.wmflabs.org/static/public/dash/
[2]
https://metrics.wmflabs.org/static/public/dash/#projects=commonswiki,metawiki/metrics=PagesCreated
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-10-30 -- 2014-11-11

2014-10-30 Thread Kevin Leduc
Hello,

We kicked off our next sprint this morning, with the help of some release
planning executed during the last 2 weeks.  The sprint status is here:
http://sb.wmflabs.org/t/analytics-developers/2014-10-30/

The focus of this sprint is working on the backend in preparation to
display new data in Vital Signs.

Bug ID

Component

Summary

Points

72740

Dashiki

Story: Vital Signs User selects the Daily Pageviews metrics

34

72741

EventLogging

List tables/schemas with data retention needs

0

72642

EventLogging

Story: Identify and direct the purging of  Event logging raw logs older
than 90 days in stat1002

0

67450

EventLogging

database consumer could batch inserts (sometimes)

34

72746

Wikimetrics

Story: WikimetricsUser tags a cohort using a pre-defined tag

5

72635

Wikimetrics

report table performance, cleanup, and number of items

13

That’s 86 points in 4 stories.

The bugs with 0 points are tasks for the team to track and follow up on,
and the work mostly falls on other teams.

Regards,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Dev Team Commitments 2014-10-16 -- 2014-10-28

2014-10-30 Thread Kevin Leduc
Hello,

The team completed all the tasks committed to in this past sprint - that’s
71 points. http://sb.wmflabs.org/t/analytics-developers/2014-10-16/

The improvements to Wikimetrics are currently on our staging (testing)
environment and will be published on the live server later this afternoon
http://metrics.wmflabs.org .  An email went out on the mailing list
wikimetrics earlier today.

Slides from the showcase presented on Tuesday are also public here:

https://docs.google.com/presentation/d/1Phslf7NZvnaAThrt5B39U6q38mQ4lhWUuOnrcvD53AY/edit?usp=sharing

cheers,
Kevin Leduc

On Thu, Oct 16, 2014 at 5:12 PM, Kevin Leduc  wrote:

> Hello,
>
> The Analytics Development Team kicked off a sprint this morning.  You can
> follow here:
>
> http://sb.wmflabs.org/t/analytics-developers/2014-10-16/
>
> The theme for this sprint is fixing the metrics.
>
> BugID
>
> Component
>
> Summary
>
> Points
>
> 71255 <https://bugzilla.wikimedia.org/show_bug.cgi?id=71255>
>
> Wikimetrics
>
> Story: WikimetricsUser downloads large CSV
> <http://sb.wmflabs.org/b/71255/>
>
> 8
>
> 66843 <https://bugzilla.wikimedia.org/show_bug.cgi?id=66843>
>
> Wikimetrics
>
> Story: User creates cohort with CentralAuth insertions
> <http://sb.wmflabs.org/b/66843/>
>
> 21
>
> 72114 <https://bugzilla.wikimedia.org/show_bug.cgi?id=72114>
>
> Wikimetrics
>
> Story: VSUser has corrected historical edits/pages data
> <http://sb.wmflabs.org/b/72114/>
>
> 8
>
> 72134 <https://bugzilla.wikimedia.org/show_bug.cgi?id=72134>
>
> Wikimetrics
>
> Story: VSUser has bots filtered out of all metrics
> <http://sb.wmflabs.org/b/72134/>
>
> 34
>
> That’s 71 points in 4 stories.
>
> During the sprint, the team will also start work on optimizing metric
> generation (Story #  69145: Creating an “editor_day" table - 34 points) but
> cannot commit to completing work in this sprint.
>
> One more thing: Marcel Ruiz Forns has joined the team.  He will be
> focusing on developing features for Wikimetrics in support of the Grant
> Making team.  This sprint, he is tackling Story  66843
> <https://bugzilla.wikimedia.org/show_bug.cgi?id=66843>.
>
> cheers,
> Kevin Leduc
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics dev points

2014-10-16 Thread Kevin Leduc
Hi Pine,

Here's some documentation on the Analytics Team's methodology, and
particularly the point scale:
https://www.mediawiki.org/wiki/Analytics/Development_Process#Planning_Poker

This morning the team tasked out some high priority features we need to
build and then voted on how many points to assign to each story.  At our
sprint planning meeting, we used the points to inform us on how much work
we can commit to accomplishing in the next Sprint based on past Sprint
velocity: http://sb.wmflabs.org/t/analytics-developers/



On Thu, Oct 16, 2014 at 5:22 PM, Dan Garry  wrote:

> In Agile methodologies, story points are arbitrary unit [1] of measurement
> for the difficulty of completing a story. The number of points a story has
> correspond, roughly, to the amount of time the story will take to complete.
> Story points are decided by the team of engineers implementing the story.
>
> You might find this enlightening:
> http://programmers.stackexchange.com/questions/182057/why-do-we-use-story-points-instead-of-man-days-when-estimating-user-stories
>
> Dan
>
> [1]: https://en.wikipedia.org/wiki/Arbitrary_unit
>
> On 16 October 2014 17:15, Pine W  wrote:
>
>> I apologize if this is an elementary question, but what are points used
>> to quantify when doing analytics development and how are points assigned?
>>
>> Pine
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Dan Garry
> Associate Product Manager, Mobile Apps
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-10-16 -- 2014-10-28

2014-10-16 Thread Kevin Leduc
Hello,

The Analytics Development Team kicked off a sprint this morning.  You can
follow here:

http://sb.wmflabs.org/t/analytics-developers/2014-10-16/

The theme for this sprint is fixing the metrics.

BugID

Component

Summary

Points

71255 <https://bugzilla.wikimedia.org/show_bug.cgi?id=71255>

Wikimetrics

Story: WikimetricsUser downloads large CSV <http://sb.wmflabs.org/b/71255/>

8

66843 <https://bugzilla.wikimedia.org/show_bug.cgi?id=66843>

Wikimetrics

Story: User creates cohort with CentralAuth insertions
<http://sb.wmflabs.org/b/66843/>

21

72114 <https://bugzilla.wikimedia.org/show_bug.cgi?id=72114>

Wikimetrics

Story: VSUser has corrected historical edits/pages data
<http://sb.wmflabs.org/b/72114/>

8

72134 <https://bugzilla.wikimedia.org/show_bug.cgi?id=72134>

Wikimetrics

Story: VSUser has bots filtered out of all metrics
<http://sb.wmflabs.org/b/72134/>

34

That’s 71 points in 4 stories.

During the sprint, the team will also start work on optimizing metric
generation (Story #  69145: Creating an “editor_day" table - 34 points) but
cannot commit to completing work in this sprint.

One more thing: Marcel Ruiz Forns has joined the team.  He will be focusing
on developing features for Wikimetrics in support of the Grant Making
team.  This sprint, he is tackling Story  66843
<https://bugzilla.wikimedia.org/show_bug.cgi?id=66843>.

cheers,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] eventlogging largest tables

2014-10-08 Thread Kevin Leduc
Sean,

I made a spreadsheet to help track what has been requested.
https://docs.google.com/a/wikimedia.org/spreadsheets/d/1RAhDbppfWDQsUXXr7r_5-7GFMgdwqbk28i1Df70Q4oU/edit?usp=sharing

Let us know if you need more information before before you can start
deleting old records.


On Wed, Oct 8, 2014 at 6:49 AM, Gilles Dubuc  wrote:

> Sounds great, Sean! The following tables can be set to keeping 40 days of
> data:
>
> MediaViewer_6054199
> MediaViewer_6055641
> MediaViewer_6066908
> MediaViewer_6636420
> MediaViewer_7670440
> MediaViewer_8245578
> MediaViewer_8572637
> MediaViewer_8935662
> MediaViewer_9792855
> MediaViewer_9989959
> MultimediaViewerAttribution_9758179
> MultimediaViewerDimensions_10014238
> MultimediaViewerDuration_8318615
> MultimediaViewerDuration_8572641
> MultimediaViewerNetworkPerformance_7393226
> MultimediaViewerNetworkPerformance_7488625
> MultimediaViewerNetworkPerformance_7917896
>
> There's a good chance that some of the older ones will end up being empty,
> in which case they can be safely dropped.
>
>
>
> On Mon, Oct 6, 2014 at 5:22 PM, Sean Pringle 
> wrote:
>
>> On Fri, Oct 3, 2014 at 2:28 AM, Gilles Dubuc 
>> wrote:
>>
>>> We can trim down our team (multimedia)'s tables considerably by getting
>>> rid of data older than 30 days. This could even be done by a daily cron.
>>> How would we go about doing that? Should we be the ones taking care of it?
>>> I'm not sure that the DB credentials I currently have can delete content.
>>>
>>
>> We can automate purging using the MariaDB using the Event Scheduler[1] if
>> you guys want a once-off-set-and-forget solution. Eg:
>>
>> CREATE TABLE purge_schedule (
>>   table_name varchar(100) NOT NULL,
>>   days tinyint(3) unsigned NOT NULL
>> );
>>
>> Then for each EL table you would do:
>>
>> INSERT INTO purge_schedule VALUES ('MultimediaTiming_7193302', 30);
>>
>> The rest would be left to me, or rather, to a couple of stored procedures
>> :-)
>>
>> [1] Basically a cron that runs stored procedures:
>> https://mariadb.com/kb/en/mariadb/documentation/stored-programs-and-views/stored-programs-and-views-events/events/
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] eventlogging largest tables

2014-10-07 Thread Kevin Leduc
Maryana,

What about deleting old records from MobileWikiAppToCInteraction?  Do we
want to treat the same as MobileWebClickTracking?  (i.e. delete records
before 2014)

On Tue, Sep 30, 2014 at 10:45 AM, Maryana Pinchuk 
wrote:

> Oh yeah, that'd be fine :)
>
> On Tue, Sep 30, 2014 at 10:38 AM, Ryan Kaldari 
> wrote:
>
>> Maryana, would it be OK if we delete the MobileWebClickTracking records
>> from before 2014? Would we still need those for any reason?
>>
>> On Tue, Sep 30, 2014 at 10:32 AM, Maryana Pinchuk > > wrote:
>>
>>> On Mon, Sep 29, 2014 at 3:10 PM, Dario Taraborelli <
>>> dtarabore...@wikimedia.org> wrote:
>>>
 On Sep 27, 2014, at 11:42 AM, Aaron Halfaker 
 wrote:

 I'm not surprised that PageContentSaveComplete is big.  That's a very
 useful table and it sees a lot of rows for good reason (every revision
 saved on every wiki).

 As for the Multimedia/Mediaviewer tables, we should probably ping
 someone on that team to discuss them.

 Dario, can you speak for the MobileWebClickTracking and
 MobileWikiAppToCInteraction schemas?

 The mobile web team uses the MobileWebClickTracking to get a rough
>>> heatmap of taps on prominent UI elements, and the apps team uses  
>>> MobileWikiAppToCInteraction
>>> to measure engagement with the table of contents on the Wikipedia app.
>>> They're both not primary metrics we're tracking but are useful to check in
>>> on every once in awhile. Does that answer your question?
>>>
>>>

 neither I nor Oliver are using this data but it’s used for some Limn
 dashboards by the Mobile team. Copying Maryana and Kaldari so they can
 chime in

 D

 On Sat, Sep 27, 2014 at 2:02 PM, Sean Pringle 
 wrote:

> Hi :-)
>
> These are the largest Eventlogging tables on m2-master:
>
> 145GMobileWebClickTracking_5929948.ibd
> 94G PageContentSaveComplete_5588433.ibd
> 61G MediaViewer_8572637.ibd
> 57G MediaViewer_8245578.ibd
> 30G MultimediaViewerNetworkPerformance_7917896.ibd
> 29G MediaViewer_8935662.ibd
> 24G MobileWikiAppToCInteraction_8461467.ibd
>
> Are these sizes roughly expected?
>
> Anything we can discard or reduce?
>
> Where did the discussion on purging data end up?
>
> No immediate problems here, just rattling cages :-)
>
> BR
> /s
>
> --
> DBA @ WMF
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
 ___
 Analytics mailing list
 Analytics@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



>>>
>>>
>>> --
>>> Maryana Pinchuk
>>> Product Manager, Wikimedia Foundation
>>> wikimediafoundation.org
>>>
>>
>>
>
>
> --
> Maryana Pinchuk
> Product Manager, Wikimedia Foundation
> wikimediafoundation.org
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Dev Team Commitments 2014-09-18 -- 2014-09-30``

2014-10-02 Thread Kevin Leduc
Result:

The team completed 34 of 57 points and is close to completing the 21 point
story #70887.  The remaining story to implement the metric “Rolling
recurring old active editors” remains more problematic with performance
issues.

Please note, the team is getting together on-site in San Francisco next
week.  Therefore the next sprint is cancelled as well as the showcase on
October 14th.  Next week the team will work together to set goals for the
next quarter and hack on things.

Final note: Vital Signs was demo’ed at the monthly metrics meeting.  This
is the culmination of a lot of hard work this last quarter and beforehand.
  We consider Vital Signs a Minimally Viable Product (MVP):

https://metrics.wmflabs.org/static/public/dash/


Slides on the demo are here:
https://commons.wikimedia.org/wiki/File:Monthly_Metrics_2014-10_-_Vital_Signs.pdf

cheers,

Kevin Leduc


On Fri, Sep 19, 2014 at 10:05 AM, Kevin Leduc  wrote:

> Hi,
>
> The fruits of our labor on Editor Engagement Vital Signs (EEVS) is on
> display.  This is still an early release, we have a backlog of feedback
> from internal stakeholders and more iterations are to come.
> https://metrics.wmflabs.org/static/public/dash/
>
>
> This sprint’s commitments are:
>
> Bug ID
>
> Component
>
> Summary
>
> Points
>
> 69569
>
> Wikimetrics
>
> Story:d WikimetricsUser runs 'Rolling Recurring old active editors' report
>
> 13
>
> 67806
>
> Visualization
>
> Story: EEVSUser loads static site in accordance to Pau's design
>
> 13
>
> 71009
>
> Wikimetrics
>
> Update 'existing' Pages Created to include delete pages
>
> 5
>
> 71008
>
> Wikimetrics
>
> Update 'existing' Edits Metric to include deleted pages
>
> 5
>
> 70887
>
> Dashiki
>
> Story: Bookmarks / Statefull URL. Define protocol and use it to bootstrap
> the dashboard and keep state
>
> 21
>
> That’s 55 Points in 5 stories
>
> Our progress is tracked in scrumbugs:
> http://sb.wmflabs.org/t/analytics-developers/2014-09-18/
>
>
> cheers,
> Kevin Leduc
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-09-18 -- 2014-09-30``

2014-09-19 Thread Kevin Leduc
Hi,

The fruits of our labor on Editor Engagement Vital Signs (EEVS) is on
display.  This is still an early release, we have a backlog of feedback
from internal stakeholders and more iterations are to come.
https://metrics.wmflabs.org/static/public/dash/


This sprint’s commitments are:

Bug ID

Component

Summary

Points

69569

Wikimetrics

Story:d WikimetricsUser runs 'Rolling Recurring old active editors' report

13

67806

Visualization

Story: EEVSUser loads static site in accordance to Pau's design

13

71009

Wikimetrics

Update 'existing' Pages Created to include delete pages

5

71008

Wikimetrics

Update 'existing' Edits Metric to include deleted pages

5

70887

Dashiki

Story: Bookmarks / Statefull URL. Define protocol and use it to bootstrap
the dashboard and keep state

21

That’s 55 Points in 5 stories

Our progress is tracked in scrumbugs:
http://sb.wmflabs.org/t/analytics-developers/2014-09-18/


cheers,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Dev Team Commitments 2014-09-04 -- 2014-09-16

2014-09-18 Thread Kevin Leduc
The team completed 55 out of 55 points!  Go team.  Here are the slides from
the showcase:

https://docs.google.com/presentation/d/1y54uF5PkYc9Sa7VWOykKXQ4DXqh_n3VxDMAR2-CCKss/edit?usp=sharing

Cheers,

Kevin Leduc


On Thu, Sep 4, 2014 at 5:38 PM, Kevin Leduc  wrote:

> Hi,
>
> The team is focused on reaching its quarterly goals (
> https://www.mediawiki.org/wiki/Wikimedia_Engineering/2014-15_Goals#Analytics
> ) and part of the team is using Agile Scrum solely for the delivery of
> Editor Engagement Vital Signs. Production issues and Refinery development
> are handled by the other part of the team (see Adventures in Clusterland
> https://lists.wikimedia.org/pipermail/analytics/2014-September/002485.html
> )
>
>
>
> Here’s a summary of the next sprint:
>
> Bug ID
>
> Component
>
> Summary
>
> Points
>
> 67459
>
> Wikimetrics
>
> Story:b WikimetricsUser runs 'Rolling New Active Editors' report
>
> 8
>
> 67460
>
> Wikimetrics
>
> Story:c WikimetricsUser runs 'Rolling Surviving New Active Editors' report
>
> 13
>
> 68822
>
> EEVS
>
> Story: AnalyticsEng has static file with list of projects and metrics
>
> 8
>
> 68445
>
> EEVS
>
> Story: EEVSUser downloads report with correct Http Cache Headers
>
> 5
>
> 68142
>
> EEVS
>
> Story: EEVSUser adds/removes a metric/project
>
> 21
>
> That’s 55 points in 5 stories.  You can see the sprint here:
>
> http://sb.wmflabs.org/t/analytics-developers/2014-09-16/
>
> cheers,
> Kevin Leduc
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-09-04 -- 2014-09-16

2014-09-04 Thread Kevin Leduc
Hi,

The team is focused on reaching its quarterly goals (
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2014-15_Goals#Analytics
) and part of the team is using Agile Scrum solely for the delivery of
Editor Engagement Vital Signs. Production issues and Refinery development
are handled by the other part of the team (see Adventures in Clusterland
https://lists.wikimedia.org/pipermail/analytics/2014-September/002485.html )



Here’s a summary of the next sprint:

Bug ID

Component

Summary

Points

67459

Wikimetrics

Story:b WikimetricsUser runs 'Rolling New Active Editors' report

8

67460

Wikimetrics

Story:c WikimetricsUser runs 'Rolling Surviving New Active Editors' report

13

68822

EEVS

Story: AnalyticsEng has static file with list of projects and metrics

8

68445

EEVS

Story: EEVSUser downloads report with correct Http Cache Headers

5

68142

EEVS

Story: EEVSUser adds/removes a metric/project

21

That’s 55 points in 5 stories.  You can see the sprint here:

http://sb.wmflabs.org/t/analytics-developers/2014-09-16/

cheers,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Dev Team Commitments 2014-08-21 -- 2014-09-02

2014-09-03 Thread Kevin Leduc
Hi all,


The team completed 37 of 50 points, plus an additional 8 points with a
story from the previous sprint.  The Showcase slides are available here:

https://docs.google.com/presentation/d/1r1wlp7yfdT0i-FRMcYNd2uN8ndbvu4Z_a3scpvKi3PU/edit?usp=sharing

Cheers,

Kevin Leduc



On Thu, Aug 21, 2014 at 3:21 PM, Kevin Leduc  wrote:

> Hi,
>
> the analytics dev team has committed to the following user stories for the
> sprint starting today, ending September 2.
>
> Bug ID
>
> Component
>
> Summary
>
> Points
>
> 69297
>
> Wikimetrics
>
> Story: EEVS user does not see reports for projects without databases
>
> 3
>
> 68351
>
> EEVS
>
> Story: AnalyticsEng has website for EEVS
>
> 34
>
> 67806
>
> EEVS
>
> Story: EEVSUser loads static site in accordance to Pau's design
>
> 13
>
> That’s 50 points in  3 Stories
>
> You can see the sprint here:
> http://sb.wmflabs.org/t/analytics-developers/2014-08-21/
>
> Note:
>
> Bug 68507 (replication lag may affect recurrent reports) is carried over
> from the previous sprint and will be completed shortly.
>
> Cheers,
> Kevin Leduc
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] pitching the Gender Edit Dashboard

2014-08-29 Thread Kevin Leduc
Does Comscore have any gender data from their panels?

I think finding out more about the gender gap in editors is a scientific
research project.  It's not an easy problem to formulate and some thorough
research and experimentation is needed.  I'm not sure if pulling together
some reports from data we have would be beneficial or actionable.

If this pitch is meant for us to prioritize gender research and closing the
gap, then let's have a discussion about that.  How important is gender
research relative to everything else we are doing at WMF?  Is this
something someone at a university would be willing to study?




On Fri, Aug 29, 2014 at 7:01 AM, Leila Zia  wrote:

>
> On Fri, Aug 29, 2014 at 4:58 AM, Dan Andreescu 
> wrote:
>
>>
>>>- I wonder if we might explore ways to improve such a survey.  For
>>>example, we might include the gender question in the signup form for a
>>>small percentage of newly registered users.
>>>
>>> This experiment sounds more useful than the current gender data.  Over
>> time, it would also allow us to track retention rate by gender for those
>> who answer the question.
>>
>
> +1
>
>
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Anonymizing and releasing 'edits per country' data for Wiki Projects

2014-08-25 Thread Kevin Leduc
I know the researchers have already put some thought around how to break
down editors by country.  I think this falls under metric standardization
and we want to have a consistent way of counting editors who edit across
several projects.  There is also an expectation that as you aggregate
numbers, editors are deduplicated (not counted twice).

I don't know if we are totally prepared to have this conversation yet.  My
priority list has discussions on how to break down editors by target site
(desktop, mobile, API) first, then how to aggregate editors across projects.


On Mon, Aug 25, 2014 at 10:55 AM, Yuvi Panda  wrote:

> Yay to more people finding it useful :)
>
> Editors / Active editors isn't too hard to had programatically. The
> bigger problem is how to define 'editor from country' - one edit from
> that country? Does that mean that one editor can be considered to be
> from multiple countries? Do we double count mobile and desktop as
> separate?
>
> An easy way to do this would be:
> 1. An 'editor from a country' is someone who has made at least one
> edit from that country
> 2. A 'desktop editor from a country' is someone who has made at least
> one edit from that country on desktop
> 3. A 'mobile editor from a country' is someone who has made at least
> one edit from that country on mobile
>
> This muddles the data some what, since
> sum(editors_from_all_countries_for_a_project) !=
> total_editors_for_project, and also sum(mobile_editors,
> desktop_editors) per country != total_editors per country. However,
> this is super simple to implement and also still useful, so I might
> end up doing that.
>
> Of course, assuming this entire thing gets OK'd fully by analytics :)
>
> On Mon, Aug 25, 2014 at 6:14 PM, Jessie Wild  wrote:
> > THIS IS SO USEFUL!
> >
> > For grantmaking, this is the exact type of dataset we want to have
> publicly
> > available. A lot of the initiatives we fund are at a country-based level,
> > and our partners have a really hard time understanding the effects of the
> > work they are doing on the aggregate language-wiki level. In addition to
> > this edits per country, it would be even more important for us to get the
> > total number of editors / active editors by country as well. Kevin - it
> > would be great to get an update from on the timeline for this (in Q4
> > 2014-15, it was punted to Q1 2014-15, but I haven't heard anything about
> it
> > yet ...)
> >
> > Thanks for starting this work, Yuvi!
> > Jessie
> >
> >
> > On Mon, Aug 25, 2014 at 9:43 AM, Yuvi Panda  wrote:
> >>
> >> On Mon, Aug 25, 2014 at 5:41 PM, Kevin Leduc 
> wrote:
> >> > Hey Yuvi,
> >> >
> >> > this sounds like very interesting data to look at.  Here are my
> >> > thoughts:
> >>
> >> :D
> >>
> >> > - the Anonymization scheme sounds reasonable, and I'd like to hear
> from
> >> > someone else @ wikimedia who has similar experience anonymizing data
> >> > sets
> >>
> >> Glad to hear that!
> >>
> >> > - you were probably already thinking about it, but we need
> documentation
> >> > too: a wikipage with the name of the table, data dictionary, etc...
> and
> >> > even
> >> > a blog post to announce the newly available data.
> >>
> >> Oh yeah, definitely. Will come once the code, etc is done :)
> >>
> >>
> >> --
> >> Yuvi Panda T
> >> http://yuvi.in/blog
> >>
> >> ___
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> >
> > --
> > Jessie Wild Sneller
> > Grantmaking Learning & Evaluation
> > Wikimedia Foundation
> >
> > Imagine a world in which every single human being can freely share in
> > the sum of all knowledge.  Help us make it a reality!
> > Donate to Wikimedia
>
>
>
> --
> Yuvi Panda T
> http://yuvi.in/blog
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Anonymizing and releasing 'edits per country' data for Wiki Projects

2014-08-25 Thread Kevin Leduc
Hey Yuvi,

this sounds like very interesting data to look at.  Here are my thoughts:

- the Anonymization scheme sounds reasonable, and I'd like to hear from
someone else @ wikimedia who has similar experience anonymizing data sets

- you were probably already thinking about it, but we need documentation
too: a wikipage with the name of the table, data dictionary, etc... and
even a blog post to announce the newly available data.




On Sun, Aug 24, 2014 at 5:21 PM, Yuvi Panda  wrote:

> Hello!
>
> I've been working for the last few days on
> https://github.com/Ironholds/WPDMZ, which currently generates raw data
> on 'number of non-bot edits per country', and I'd like to run some
> stats / make some graphs based on it. Since I'd like al l my
> 'research' to be completely repeatable, I'd love it if we can make the
> 'raw data' (edits per country) publicly available on labsdb. I have
> most of the code written for it, *but* it needs anonymization.
>
> The biggest de-anonymization threats involve identifying which editors
> come from which countries, and can be executed in the following case:
>
> An editor is the only person editing from a country in a project where
> the country has low edit volume, and by a process of elimination /
> counting edits from a public source (like recentchanges), the
> individual editor can be connected to a particular country
>
> I propose the following Anonymization scheme:
>
> 1. No data for projects with less than a threshold of total
> *individual editors* in the time period for which the data is
> released.
> 2. For countries that have less than a threshold % of 'individual
> editors' in the time period, we just simply lump them in as 'other'.
>
> This removes most anonymization attacks I can think of. Thoughts? I
> can easily write up the code to generate these on a monthly basis and
> puppetize those to make the data publicly available. I think not just
> me, but lots of external researchers would benefit from such data.
>
> Thanks!
>
> --
> Yuvi Panda T
> http://yuvi.in/blog
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-08-21 -- 2014-09-02

2014-08-21 Thread Kevin Leduc
Hi,

the analytics dev team has committed to the following user stories for the
sprint starting today, ending September 2.

Bug ID

Component

Summary

Points

69297

Wikimetrics

Story: EEVS user does not see reports for projects without databases

3

68351

EEVS

Story: AnalyticsEng has website for EEVS

34

67806

EEVS

Story: EEVSUser loads static site in accordance to Pau's design

13

That’s 50 points in  3 Stories

You can see the sprint here:
http://sb.wmflabs.org/t/analytics-developers/2014-08-21/

Note:

Bug 68507 (replication lag may affect recurrent reports) is carried over
from the previous sprint and will be completed shortly.

Cheers,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Expanding wikimetrics cohorts using CentralAuth

2014-08-20 Thread Kevin Leduc
Hey Dan G and Analytics team,

I wanted to continue and finish the discussion that happened during the
Analytics showcase earlier today.

We're implementing a new feature in Wikimetrics where you can upload a
cohort and check a box so that every user's accounts on other wikis
(projects) will be added to the cohort (using CentralAuth).  The purpose is
to see if editors are active on other projects.

The research scientists pointed out that there are issues with CentralAuth
and they are showing up in EventLogging (
https://bugzilla.wikimedia.org/show_bug.cgi?id=66101 ).

Let me try to sum up the issue here:
Suppose someone has an unattached account.  She then went to an editathon
and volunteered her name to be included in a cohort.  The resulting cohort
when expanded with CentralAuth would include users from other wikis.

Dan pointed out that this would be extremely unlikely that a cohort
expanded using CentralAuth would include unattached users.

I'm inclined to not worry about the issue and move ahead with releasing the
feature.

Please discuss I'm missing something.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics Dev Team Commitments 2014-08-07 -- 2014-08-19

2014-08-19 Thread Kevin Leduc
Update: the team completed 34 of 42 points this sprint.  Slides from the
showcase are here:
https://docs.google.com/presentation/d/1caD0WzSx6PQFFU8Sl3AYDFyzVxfEXT1-UN55chnCB3I/edit?usp=sharing

cheers,
Kevin Leduc


On Mon, Aug 11, 2014 at 7:34 AM, Kevin Leduc  wrote:

> Correction:  The Analytics Dev team has not committed to complete 67806
> Story: EEVSUser loads static site in accordance with Pau's design.
>
> There was a misunderstanding and the team said they could work on the
> above issue during the sprint if there is time, but there is no commitment
> to complete the story during the sprint.
>
>
> On Fri, Aug 8, 2014 at 5:05 PM, Kevin Leduc  wrote:
>
>> Hi,
>>
>> the dev team has committed to the following user stories for the sprint
>> starting today, ending August 19.
>>
>> Bug ID
>>
>> Component
>>
>> Summary
>>
>> Points
>>
>> 68731
>>
>> Wikimetrics
>>
>> Backing up wikimetrics data fails if data is written while we back it up
>>
>> 5
>>
>> 68833
>>
>> Wikimetrics
>>
>> session management
>>
>> 21
>>
>> 68840
>>
>> EEVS
>>
>> Wikimetrics can't run a lot of recurrent reports at the same time
>>
>> 8
>>
>> 67806
>>
>> Wikimetrics
>>
>> Story: EEVSUser loads static site in accordance to Pau's design
>>
>> 13
>>
>> 68507
>>
>> Wikimetrics
>>
>> replication lag may affect recurrent reports
>>
>> 8
>>
>> Total Points: 55
>>
>> You can see the sprint here:
>> http://sb.wmflabs.org/t/analytics-developers/2014-08-07/
>>
>> Cheers,
>>
>> Kevin Leduc
>>
>>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Public EventLogging --> LabsDB

2014-08-13 Thread Kevin Leduc
Yes, getting EL data into labs would support longer term EEVS goals, and
I'm trying to focus on EEVS features we can release this quarter.


On Wed, Aug 13, 2014 at 3:56 PM, Dario Taraborelli <
dtarabore...@wikimedia.org> wrote:

> (expanding on what I think Dan is referring to re: goals), addressing this
> issue would allow EEVS to access data needed to generate breakdowns for
> metrics by method/target site (mobile, desktop, apps).
>
> On Aug 13, 2014, at 1:40 PM, Dan Andreescu 
> wrote:
>
> Kevin, for what it's worth I don't think that bug that Sean is asking for
> is that challenging.  The relevant part we'd have to change is really just
> a few lines [1].  I respect your decision of course, but I just wanted to
> point out that this issue does drive towards some of our goals, as we
> talked a bit about getting EventLogging data to be usable by Wikimetrics,
> and this is the first step.
>
>
> [1] -
> https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FEventLogging/4d917e1594e6f09784ab0e0bffccc144f87a11b3/server%2Feventlogging%2Fjrm.py#L167
>
>
> On Wed, Aug 13, 2014 at 4:19 PM, Aaron Halfaker 
> wrote:
>
>> OK.  Sounds reasonable.  Sorry to seem as though I am pushing on you &
>> the devs.  In fact, specifying that you won't have the bandwidth to even
>> consider the bug until next quarter gives me the power to push on others.
>>  >:)
>>
>> Thanks!
>> -Aaron
>>
>>
>> On Wed, Aug 13, 2014 at 8:56 PM, Kevin Leduc  wrote:
>>
>>> Hi Aaron,
>>>
>>> I was not planning on prioritizing any EventLogging work for the rest of
>>> this quarter.  The analytics dev team has a goal to get an EEVS dashboard
>>> running and I want to keep them focused otherwise we will not reach this
>>> goal.
>>>
>>> I'm tempted to ask what springle and YuviPanda can accomplish without
>>> the help of the analytics devs, but even that will imply discussions and
>>> distractions from our goals.
>>>
>>> In September I am planning on looking at what goals we can set for the
>>> next quarter and look at what we want to accomplish with EventLogging.  I
>>> was going to prioritize it at that point.
>>>
>>>
>>>
>>>
>>> On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker <
>>> ahalfa...@wikimedia.org> wrote:
>>>
>>>> Excellent.  Kevin, can you work to get that bug[1] prioritized and let
>>>> us know?   I can start working with R&D on a proposal to bring to legal.
>>>>
>>>> 1. https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
>>>>
>>>> It stands to reason that you would be interested on the capsule too as
>>>>> it holds the timestamp and wiki project the event applies to, but I 
>>>>> imagine
>>>>> we can make fields public selectively.
>>>>
>>>>
>>>> Fair enough.  I think we can drop that one column from the capsule and
>>>> be quite happy with the rest.  No need to purge EventLogging.
>>>>
>>>> -Aaron
>>>>
>>>>
>>>> On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz 
>>>> wrote:
>>>>
>>>>> > Re. (2), I didn't say anything about that being related to
>>>>> public/private.
>>>>> > This is a request from springle -- that if we are going to start
>>>>> pushing
>>>>> > Events to LabsDB, he'd like us to do so more efficiently.  That bug
>>>>> is about efficiently batching inserts.
>>>>> ah, my mistake. Kevin can do prioritization as needed.
>>>>>
>>>>> >If you are concerned about UserAgents as the sanitization page you
>>>>> linked to suggests, then we should talk about the >EventLogging capsule,
>>>>> not the event.
>>>>> If you want to be so precise, sure, that is correct. Note that
>>>>> currently there is no distinction in storage as to the event and the
>>>>> capsule, they are stored together in the same record. Capsule data is only
>>>>> identified by a prefix on the column name. It stands to reason that you
>>>>> would be interested on the capsule too as it holds the timestamp and wiki
>>>>> project the event applies to, but I imagine we can make fields public
>>>>> selectively.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 13, 2014 at 6:47 PM, Aaro

Re: [Analytics] Public EventLogging --> LabsDB

2014-08-13 Thread Kevin Leduc
Hi Aaron,

I was not planning on prioritizing any EventLogging work for the rest of
this quarter.  The analytics dev team has a goal to get an EEVS dashboard
running and I want to keep them focused otherwise we will not reach this
goal.

I'm tempted to ask what springle and YuviPanda can accomplish without the
help of the analytics devs, but even that will imply discussions and
distractions from our goals.

In September I am planning on looking at what goals we can set for the next
quarter and look at what we want to accomplish with EventLogging.  I was
going to prioritize it at that point.




On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker 
wrote:

> Excellent.  Kevin, can you work to get that bug[1] prioritized and let us
> know?   I can start working with R&D on a proposal to bring to legal.
>
> 1. https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
>
> It stands to reason that you would be interested on the capsule too as it
>> holds the timestamp and wiki project the event applies to, but I imagine we
>> can make fields public selectively.
>
>
> Fair enough.  I think we can drop that one column from the capsule and be
> quite happy with the rest.  No need to purge EventLogging.
>
> -Aaron
>
>
> On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz  wrote:
>
>> > Re. (2), I didn't say anything about that being related to
>> public/private.
>> > This is a request from springle -- that if we are going to start
>> pushing
>> > Events to LabsDB, he'd like us to do so more efficiently.  That bug is
>> about efficiently batching inserts.
>> ah, my mistake. Kevin can do prioritization as needed.
>>
>> >If you are concerned about UserAgents as the sanitization page you
>> linked to suggests, then we should talk about the >EventLogging capsule,
>> not the event.
>> If you want to be so precise, sure, that is correct. Note that currently
>> there is no distinction in storage as to the event and the capsule, they
>> are stored together in the same record. Capsule data is only identified by
>> a prefix on the column name. It stands to reason that you would be
>> interested on the capsule too as it holds the timestamp and wiki project
>> the event applies to, but I imagine we can make fields public selectively.
>>
>>
>>
>>
>>
>> On Wed, Aug 13, 2014 at 6:47 PM, Aaron Halfaker 
>> wrote:
>>
>>> Re. (2), I didn't say anything about that being related to
>>> public/private.  This is a request from springle -- that if we are going to
>>> start pushing Events to LabsDB, he'd like us to do so more efficiently.
>>>  That bug is about efficiently batching inserts.
>>>
>>> I don't know what you are talking about re. 90 day purges.  I'm talking
>>> about 100% public Event logging events -- E.g.
>>> https://meta.wikimedia.org/wiki/Schema:PageMove   Also, we do *not*
>>> need to purge EventLogging event data at 90 days.  We need to purge PII at
>>> 90 days.  We generally do not store PII in EventLogging events, but when we
>>> do, we organize 90 days purges as we have recently for the anonymous editor
>>> experiments.  If you are concerned about UserAgents as the sanitization
>>> page you linked to suggests, then we should talk about the EventLogging
>>> capsule, not the event.
>>>
>>> Re. (1), we are already performing this review internally in order to
>>> determine what does and does not conform to the Data Retention Guidelines.
>>>  It seems clear that a robust process could also identify non-sensitive
>>> Schemas that could be published in labs.
>>>
>>> -Aaron
>>>
>>>
>>> On Wed, Aug 13, 2014 at 5:00 PM, Nuria Ruiz  wrote:
>>>
 Aaron,

 >(2) https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
 The bug does not have to do with making data public. It has to do with
 how data is inserted in to EL from the
 consumers, so it deals with the 'system', not the 'data'. The raw data
 as inserted cannot be replicated directly to be made public so whether
 inserts are more efficient does not affect the public/private discussion.


 >(1) there needs to be a good review process in place to make sure
 that the data we surface isn't sensitive
 There is a bunch of work involved on this item. For example: per our
 privacy policy some of this data should be discarded after 90 days and
 currently it is not. Also, you are aware of the discussions under
 sanitization:
 https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization

 Basically to make EL data public it needs to be aggregated with a level
 of anonymization we think is acceptable. There is quite a bit of work on
 this regard, here are some bugs that were filed a while back:

 https://bugzilla.wikimedia.org/show_bug.cgi?id=62978

 https://bugzilla.wikimedia.org/show_bug.cgi?id=59832







 On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <
 ahalfa...@wikimedia.org> wrote:

> Hey folks,
>
> We've been discussing ways to make more Wikim

Re: [Analytics] Analytics Dev Team Commitments 2014-08-07 -- 2014-08-19

2014-08-11 Thread Kevin Leduc
Correction:  The Analytics Dev team has not committed to complete 67806
Story: EEVSUser loads static site in accordance with Pau's design.

There was a misunderstanding and the team said they could work on the above
issue during the sprint if there is time, but there is no commitment to
complete the story during the sprint.


On Fri, Aug 8, 2014 at 5:05 PM, Kevin Leduc  wrote:

> Hi,
>
> the dev team has committed to the following user stories for the sprint
> starting today, ending August 19.
>
> Bug ID
>
> Component
>
> Summary
>
> Points
>
> 68731
>
> Wikimetrics
>
> Backing up wikimetrics data fails if data is written while we back it up
>
> 5
>
> 68833
>
> Wikimetrics
>
> session management
>
> 21
>
> 68840
>
> EEVS
>
> Wikimetrics can't run a lot of recurrent reports at the same time
>
> 8
>
> 67806
>
> Wikimetrics
>
> Story: EEVSUser loads static site in accordance to Pau's design
>
> 13
>
> 68507
>
> Wikimetrics
>
> replication lag may affect recurrent reports
>
> 8
>
> Total Points: 55
>
> You can see the sprint here:
> http://sb.wmflabs.org/t/analytics-developers/2014-08-07/
>
> Cheers,
>
> Kevin Leduc
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-08-07 -- 2014-08-19

2014-08-08 Thread Kevin Leduc
Hi,

the dev team has committed to the following user stories for the sprint
starting today, ending August 19.

Bug ID

Component

Summary

Points

68731

Wikimetrics

Backing up wikimetrics data fails if data is written while we back it up

5

68833

Wikimetrics

session management

21

68840

EEVS

Wikimetrics can't run a lot of recurrent reports at the same time

8

67806

Wikimetrics

Story: EEVSUser loads static site in accordance to Pau's design

13

68507

Wikimetrics

replication lag may affect recurrent reports

8

Total Points: 55

You can see the sprint here:
http://sb.wmflabs.org/t/analytics-developers/2014-08-07/

Cheers,

Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Replication lag on analytics-store.eqiad.wmnet >12 hours for s1 replicas

2014-08-01 Thread Kevin Leduc
I just added the bug to the Scrumbugs backlog.

Christian, you're right about it not getting prioritized soon.  We'll go
through the backlog again in September while doing some release planning
for the next quarter.


On Thu, Jul 31, 2014 at 4:05 AM, Christian Aistleitner <
christ...@quelltextlich.at> wrote:

> Hi Sean,
>
> On Thu, Jul 31, 2014 at 12:19:33PM +1000, Sean Pringle wrote:
> > On Wed, Jul 30, 2014 at 4:40 PM, Christian Aistleitner <
> > christ...@quelltextlich.at> wrote:
> >
> > >
> > > Lag on log is currently still ~ 16 hours. I'll keep an eye on it.
> > >
> >
> > https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
>
> Full ACK.
>
> I was really glad when I saw the bug getting added back then.
>
> However, given what I heard about our investment in EventLogging,
> I doubt that it'll get prioritized soon :-(
>
>
> > EL could be much less susceptible to lag, and recover faster when it
> occurs.
>
> Full ACK.
>
> Although the final problematic query finished yesterday, with the
> current rate of replication lag recovery, it'll take us until tomorrow
> to have fully recovered :-(
>
> Have fun,
> Christian
>
>
> --
>  quelltextlich e.U.  \\  Christian Aistleitner 
>Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>  Fax:+43 7946 / 20 5 81
>  Homepage: http://quelltextlich.at/
> ---
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Request for data purge for a select number of schemas

2014-07-31 Thread Kevin Leduc
aha, I should have logged a bug a long time ago, but I was too much a
newbie to know.  Here it is:
https://bugzilla.wikimedia.org/show_bug.cgi?id=68978

Christian: before I prioritize it, can you scope out how much work would be
required?

thanks,
Kevin


On Thu, Jul 31, 2014 at 4:08 PM, Christian Aistleitner <
christ...@quelltextlich.at> wrote:

> Hi Steven,
>
> On Thu, Jul 31, 2014 at 10:00:56AM -0700, Steven Walling wrote:
> > On Thu, Jul 31, 2014 at 9:51 AM, Christian Aistleitner <
> > christ...@quelltextlich.at> wrote:
> > > So if there is no extra-special need, [...]
> >
> > [...] because that is what we
> > agreed to with Legal. [...]
>
> “Agreement with legal” qualifies as perfectly fine “extra-special
> need” for me :-)
>
> Let's spend time removing the Schemas from the logs then.
>
> Since you said “probably” in the OP when it came to the data to
> remove. ... is it sufficient to remove the six schemas from
>
> https://meta.wikimedia.org/wiki/Research:Asking_anonymous_editors_to_register#Schemas
> ?
> Or are there further schemas?
>
> I guess removal of the data for all days in the past, or only some
> period back?
>
> From my point of view, we can take discussing of details off-list.
>
> > I also let Kevin know months ago we would need some help doing this
> > from Analytics.
>
> Kevin, please do chime in on such threads then :-)
>
> Have fun,
> Christian
>
>
>
> --
>  quelltextlich e.U.  \\  Christian Aistleitner 
>Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>  Fax:+43 7946 / 20 5 81
>  Homepage: http://quelltextlich.at/
> ---
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Commitments 2014-07-24 -- 2014-08-05

2014-07-24 Thread Kevin Leduc
Hi,

the dev team has committed to the following user stories for the sprint
starting today, ending August 5.

Bug ID

Component

Summary

Points

68516

Wikimetrics

Story: Researcher has prototype for wikimania

8

Total Points: 8

You can see the sprint here:
http://sb.wmflabs.org/t/analytics-developers/2014-07-24/

Notes:

- 2/3 of the team is going on vacation during this sprint impacting our
regular velocity.

- Story 67128 has been completed since the previous sprint ended.

- Issues 67694, 68516 are carried over from the last sprint and work will
continue on them.

- Issue 68519 is new.

- Issues with 0 points are considered high priority production issues that
need to be resolved relatively quickly.  We do not wait for our tasking or
sprint planning to work on them.  The dev team takes this background noise
into account when committing to in a sprint.

Regards,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Dev Team Showcase

2014-07-22 Thread Kevin Leduc
The dev team completed its 2 week sprint today and showcased its progress.
 The slides from the showcase are here:
https://docs.google.com/presentation/d/1UtQpfgHW-kIeaeHE_RB_W16ZD3qWvQfzVi0ZUFQOyiE/edit?usp=sharing

Following our planning session this coming Thursday, the team will announce
the user stories the team will commit to completing for the next sprint.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Team's Sprint Commitments

2014-07-15 Thread Kevin Leduc
Hi Pierre-Selim

Before the last sprint, we adjusted how many points we assign to stories.
 We didn't have enough range to compare the complexity, and all things
ended up being 5, 8 or 13 points.  This was identified at a retrospective
so we decided to recalibrate our points scale.  Therefore it's not useful
to compare the points in the last sprint with the past.  We mentioned this
during our last showcase.

https://docs.google.com/a/wikimedia.org/presentation/d/1Y2uI_oOhXGpcn8y-EHBAAqzxS2Lp5OFEXIkIkd6J6A0/edit#slide=id.gb4d76b02_00


And yes, over the last few months our velocity has been increasing.  I
think there are many reasons for this:
- we have not added or lost any new team members in the last 3 months
- as the new product manager, I have been creating smaller stories and
helping the team focus (I'm trying hard not to take all the credit here)
- the team has really taken on completing its commitments during a sprint
and we have been very interested where we are at at each scrum (we use a
spreadsheet and burn up hours)
- We have also been focused on fewer quarterly goals have pushed back on
some issues/requests.



On Tue, Jul 15, 2014 at 3:27 PM, Pierre-Selim 
wrote:

> Just out of curiosity (as a product owner myself), as I can see the team
> is planning and delivering more and more at each sprint 16, 31, 45 and 55
> (current sprint): is there any reason for such increase in velocity ?
>
> Oh btw kudos for all the features delivered, and thank you for the
> transparency which is not always easy to do :)
>
>
> 2014-07-16 0:17 GMT+02:00 Christian Aistleitner <
> christ...@quelltextlich.at>:
>
>> Hi,
>>
>> On Wed, Jul 02, 2014 at 11:00:44AM +0200, Christian Aistleitner wrote:
>> > On Tue, Jul 01, 2014 at 09:16:26AM -0700, Kevin Leduc wrote:
>> > > Our current sprint started Thursday June 26 and ends Tuesday July 8th.
>> > > [...]
>> > > http://sb.wmflabs.org/t/analytics-developers/2014-06-26/
>>
>> Of the originally planned 58 points, we delivered 45 points.
>>
>> We did not accomplish:
>>
>> >  |  67129 |  8 | Refinery | Admin has versioned and sync'ed
>> files in  |
>> >  |||  |   HDFS
>>|
>> >  |  67128 |  5 | Refinery | Admin has duplicate monitoring in
>> Icinga  |
>>
>> Both items have been carried over to the new sprint.
>> * Bug 67129 has been done in the meantime.
>> * Bug 67128 has been re-discussed and had it's requirements and
>>   purpose changed considerably, and now better fits the overall
>>   picture and delivers more value. But it also made the card bigger.
>>
>> Have fun,
>> Christian
>>
>>
>>
>> --
>>  quelltextlich e.U.  \\  Christian Aistleitner 
>>Companies' registry: 360296y in Linz
>> Christian Aistleitner
>> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
>> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>>  Fax:+43 7946 / 20 5 81
>>  Homepage: http://quelltextlich.at/
>> ---
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Pierre-Selim
>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Team Showcase

2014-07-09 Thread Kevin Leduc
Hi all,

The slides from the analytics showcase are publicly available:
https://docs.google.com/presentation/d/1Y2uI_oOhXGpcn8y-EHBAAqzxS2Lp5OFEXIkIkd6J6A0/edit?usp=sharing

I added a few screenshots to them to highlight what was showcased live.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Team's Sprint Commitments

2014-07-01 Thread Kevin Leduc
Greetings,

The analytics engineering team is now using ScrumBugs to manage and track
it's commitments for every Sprint (2 week iterations).

Our current sprint started Thursday June 26 and ends Tuesday July 8th.  You
can see the Stories (features) and bugs we are presently working on:
http://sb.wmflabs.org/t/analytics-developers/2014-06-26/

Note, the third pie chart "Story User" shows the beneficiaries of the
features being implemented.
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Wikipedia featured on FiveThirtyEight blog

2014-05-30 Thread Kevin Leduc
It's nice to see they are interested in our data:

http://fivethirtyeight.com/datalab/the-100-most-edited-wikipedia-articles/
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics maintainers

2014-05-21 Thread Kevin Leduc
I don't get it... am I going to hell for requesting deletion of an article
:-)




On Tue, May 20, 2014 at 3:38 PM, Ori Livneh  wrote:

> On Tue, May 20, 2014 at 3:10 PM, Federico Leva (Nemo) 
> wrote:
>
>> Obligatory Florence reading: https://meta.wikimedia.org/wiki/Keep_history
>
>
> Obligatory Florence reading:
> https://it.wikisource.org/wiki/Divina_Commedia/Purgatorio/Canto_XXXI#riga96
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics maintainers

2014-05-20 Thread Kevin Leduc
anyone know who the mediawiki.org administrators are or where I can find
out?  I want to request deletion of the pages Nuria mentioned.


On Mon, May 19, 2014 at 1:27 PM, Toby Negrin  wrote:

> Thanks Nuria -- Kevin is working on some new docs and is aware of some of
> your work in this area.
>
> The Leadership scrum isn't a thing that we do; Kevin and I have a 1:1 each
> week where we talk about the issues that would be addressed in this meeting.
>
> The Engineering scrum can also be deleted as I believe Kevin is
> documenting this elsewhere.
>
> -Toby
>
>
> On Mon, May 19, 2014 at 5:12 AM, Nuria Ruiz  wrote:
>
>> I have also updated some docs removing references to diedrik and
>> mingle in a bunch of places. I saw some pages that need deletion as
>> its info is (I believe)  being updated by kevin in a new set of docs.
>>
>> However, I did not have permits to delete:
>>
>>
>> https://www.mediawiki.org/wiki/Analytics/Management/Analytics_Leadership_Scrum
>>
>> This one either needs deletion or much updating:
>>
>>
>> https://www.mediawiki.org/wiki/Analytics/Management/Analytics_Engineering_Scrum
>>
>> On Mon, May 19, 2014 at 1:10 AM, Kevin Leduc  wrote:
>> > Hi Erik,
>> >
>> > I'll take some time out of my 1:1 with Toby this week to update the
>> list of
>> > Maintainers.
>> > Late last week, I also asked Guillaume to refactor our project list so
>> it
>> > will be easier to maintain going forward (especially for the monthly
>> > engineering reports).
>> >
>> >
>> >
>> > On Mon, Apr 28, 2014 at 6:35 PM, Erik Moeller 
>> wrote:
>> >>
>> >> This could use some updating to reflect current team roles/membership:
>> >>
>> >> https://www.mediawiki.org/wiki/Developers/Maintainers#Analytics
>> >> https://www.mediawiki.org/wiki/Analytics#Projects
>> >> --
>> >> Erik Möller
>> >> VP of Engineering and Product Development, Wikimedia Foundation
>> >>
>> >> ___
>> >> Analytics mailing list
>> >> Analytics@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > ___
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] generating sequences on analytics slaves

2014-05-18 Thread Kevin Leduc
Has anyone taken responsibility for fixing this?


On Sun, May 4, 2014 at 5:58 AM, Sean Pringle  wrote:

> Hi All
>
> The query form listed inline below runs periodically on s1-analytics-slave
> and now analytics-store. It generates 30 days worth of data without gaps by
> joining a table known to have more than 30 rows.
>
> That old trick is perfectly OK. However the table chosen is
> information_schema.columns and that's NOT OK :-)
>
> The problem is that the metadata in some information_schema tables must be
> materialized every time they're accessed. In this case the .frm file for
> *every table in the system* must be opened and checked before the query
> runs[1]. Yes, every table.
>
> This was probably always slow on s1-analytics-slave with enwiki + log +
> personal tables. It's even slower now that analytics-store holds all wikis.
>
> SELECT
> Month.Date,
> COALESCE(Web.Web, 0) AS Web
> FROM
> (SELECT
> DATE_FORMAT(ADDDATE(CURDATE() - INTERVAL 30 - 1 DAY,
> @num:=@num+1), '%Y-%m-%d') AS Date
> FROM information_schema.columns, (SELECT @num:=-1) num LIMIT 30)
> AS Month
> LEFT JOIN
> (SELECT DATE(timestamp) AS Date, SUM(1) AS Web FROM (
> SELECT timestamp, wiki, event_username, event_action,
> event_namespace, event_userEditCount
> FROM MobileWebEditing_5644223
> UNION
> SELECT timestamp, wiki, event_username, event_action,
> event_namespace, event_userEditCount
> FROM MobileWebEditing_6077315
> UNION
> SELECT timestamp, wiki, event_username, event_action,
> event_namespace, event_userEditCount
> from MobileWebEditing_6637866
> UNION
> SELECT timestamp, wiki, event_username, event_action,
> event_namespace, event_userEditCount
> from MobileWebEditing_7675117
> ) as MobileWebEditing
> WHERE
> event_action = 'error'
> AND wiki != 'testwiki'
> GROUP BY Date
> ) AS Web
> ON Month.Date = Web.Date;
>
> MariaDB 10 has an SQL trick for generating sequences. Hackish, but simple:
>
> SELECT seq FROM seq_1_to_5;
>
> +-+
> | seq |
> +-+
> |   1 |
> |   2 |
> |   3 |
> |   4 |
> |   5 |
> +-+
>
> https://mariadb.com/kb/en/sequence/
>
> Alternatively, we can put a real sequence table somewhere handy with, say,
> 1000 integers.
>
> BR
> Sean
>
> [1] The file accesses can be alleviated by having large table [definition]
> cache(s), however then we're talking hundreds of thousands more open file
> handles which is a whole new ceiling waiting to be hit :-)
>
> --
> DBA @ WMF
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] Analytics maintainers

2014-05-18 Thread Kevin Leduc
Hi Erik,

I'll take some time out of my 1:1 with Toby this week to update the list of
Maintainers.
Late last week, I also asked Guillaume to refactor our project list so it
will be easier to maintain going forward (especially for the monthly
engineering reports).



On Mon, Apr 28, 2014 at 6:35 PM, Erik Moeller  wrote:

> This could use some updating to reflect current team roles/membership:
>
> https://www.mediawiki.org/wiki/Developers/Maintainers#Analytics
> https://www.mediawiki.org/wiki/Analytics#Projects
> --
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] Analytics Sprint Schedule Change

2014-04-23 Thread Kevin Leduc
Hi all,

I am the new Product Manager for Analytics at Wikimedia.  I have been on
the job for a few weeks now and still have lots to learn :-)  I am writing
to let you know about a one-off change in the development team's work
schedule.

The analytics team is canceling it's upcoming [Agile/Scrum] sprint
scheduled to start May 1st and end May 13th.  The showcase on May 13th is
canceled as well.  The Analytics team has off-site meetings beginning May
5th and will be at the Wikimedia Hackathon in Zurich until May 11th.

We will resume our regularly scheduled 2-week sprints and respective
meetings when we are back.

Sincerely,
Kevin Leduc
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics