Re: [Analytics] [Multimedia] Filtering out outliers in data used to generate tsvs

2014-04-21 Thread Nuria Ruiz
As Gergo pointed out, these early results may be because our first beta testers may have some faster connections than average users. But could there also be some bots or other traffic which could be distorting the results? I know that we are working next on histograms that will give us a better

Re: [Analytics] [Multimedia] Using EventLogging for funnel analysis

2014-05-15 Thread Nuria Ruiz
The timestamp at which the current flow through the funnel began (will need to be stored in a cookie and reset at loads of step 1) I would strongly advise against using cookies for this purpose. Cookies will easily get bloated if we set a precedence of using them to 'support' event logging

Re: [Analytics] [Multimedia] Using EventLogging for funnel analysis

2014-05-16 Thread Nuria Ruiz
Thanks Aaron, I will try something along these lines. This avoids the latency concerns mentioned by Nuria, and it is very flexible - we'll see how painful it is to aggregate the data on the backend. So we agree you do not need to use cookies right? Being a single page app you should not need

Re: [Analytics] [Multimedia] EventLogging ballooning

2014-05-20 Thread Nuria Ruiz
[gerco] - whenever we display geometric means, we weight by sampling rate (exp(sum(sampling_rate * ln(value)) / sum(sampling_rate)) instead of exp(avg(ln(value [gilles] I don't follow the logic here. Like percentiles, averages should be unaffected by sampling, geometric or not.

Re: [Analytics] purging old data from eventlogging db

2014-05-21 Thread Nuria Ruiz
Not to hijack the thread, but: to do this in the schema itself confuses the structure of the data with the mechanics of its use. I think having a couple of helpers in JavaScript and PHP for simple random sampling is sufficient. Much agree with ori here. We would be bloating schema with

Re: [Analytics] [Multimedia] Media Viewer Dashboards

2014-05-21 Thread Nuria Ruiz
[gerco]From action events, we were getting about 15M a day, and we only use them to show total counts (daily number of clicks etc). How do we tell when the sampling ratio is right for that? [gilles] I think you're overthinking it, you seem to be looking for the perfect figure. Let's start with an

Re: [Analytics] Please help maintain our dashboard directory

2014-05-25 Thread Nuria Ruiz
It would help if limn set up an empty robots.txt instead of returning garbage to search engines. :) That might help a very small bit as much of lim is client side generated. The core problem is that limn is just a visualization tool, there is no browsing component so either you know the endpoint

Re: [Analytics] Data quality issues with account creation log

2014-06-06 Thread Nuria Ruiz
If someone could document the reasons why the userName is needed on this schema it will be great. They can be documented on the schema talk page: http://meta.wikimedia.org/wiki/Schema_talk:ServerSideAccountCreation When I looked at this issue early on it was not at all obvious to me why - if you

[Analytics] Analytics team is hiring.

2014-06-17 Thread Nuria Ruiz
Hello, Just a brief note to let everyone know that the analytics team is hiring, if you have an an interested in analytics, Wikipedia and its sister projects we would love to hear from you. Check our positions and apply: https://www.mediawiki.org/wiki/Analytics/Research_and_Data#Open_positions

Re: [Analytics] EventLogging on graphite

2014-06-18 Thread Nuria Ruiz
mmm... I am not sure whether 'per schema' reports worked well before. Need to look at code and see whether the schema counts are being sent. Overall counts seem to be working well:

[Analytics] Fwd: ** PROBLEM alert - tungsten/Throughput of event logging events is CRITICAL **

2014-06-25 Thread Nuria Ruiz
(to public list and cc-ing Nemo) Hello, Since last time we had an increase in throughput in Even Logging Nemo had to notify us via e-mail this is just a brief note to the list to say that we now have throughput monitoring for event logging and it is working. We had a throughput spike today that

[Analytics] EL graphite counts

2014-06-30 Thread Nuria Ruiz
Team: I have added some info to wikitech on how to troubleshoot issues with EL and graphite: https://wikitech.wikimedia.org/wiki/EventLogging#Fix_graphite_counts_not_working.3F https://wikitech.wikimedia.org/wiki/EventLogging#Graphite Thanks, Nuria

[Analytics] Monitoring for Event Logging on Graphite

2014-07-09 Thread Nuria Ruiz
Hello, We have restored per schema monitoring for Event Logging in graphite. Users of the Event Logging system can use the schema monitoring to see how big (or small) is the their usage of EventLogging compared to the total throughput of events. See for example the overall rate of incoming

Re: [Analytics] Storing intermediate results for a limn1 - analytics-store DB query

2014-07-10 Thread Nuria Ruiz
Gerco, I was trying to access: http://multimedia-metrics.wmflabs.org/ but no luck. There is a third choice as far as I can see (my team needs to double check me on this). You could have a metric in wikimetrics that harvest the data you are interested on from enwiki,eswiki, arwiki databases

Re: [Analytics] Dashboard-like frontend for graphite

2014-07-10 Thread Nuria Ruiz
[Steven] Considering the pain and suffering Limn causes us, this seems like an interesting [Steven] avenue to explore for internal dashboard needs. So true. It sure causes me pain and suffering seeing every js library known to mankind being used there. :) We will definitely take a look at the

Re: [Analytics] Dashboard-like frontend for graphite

2014-07-10 Thread Nuria Ruiz
, 2014 at 8:03 PM, Steven Walling swall...@wikimedia.org wrote: On Thu, Jul 10, 2014 at 10:40 AM, Nuria Ruiz nu...@wikimedia.org wrote: Please take a look at the prototype of the editor vital signs dashboard as that makes the point of what is what we are doing in the near term: http

[Analytics] Hadoop and More. An overview of Analytics infrastructure

2014-07-16 Thread Nuria Ruiz
Hello everyone, Just an FYI that we gave a talk yesterday about the hadoop infrastructure we have recently set up in production to receive and store pageview data. Talk is about 25 minutes long and recording is available here: https://plus.google.com/u/0/events/c53ho5esd0luccd09a1c30rlrmg

Re: [Analytics] Media Viewer User Preference Data

2014-07-16 Thread Nuria Ruiz
a) aim to track total users who enable/disable Media Viewer, rather than just events b) switch to a 3-state preference setting: enabled / disabled / default c) try to measure the total number of users in each group (instead of daily events) I assume we are talking about logging stuff for logged

[Analytics] Editor Engagement Vital Signs Dashboard

2014-07-24 Thread Nuria Ruiz
(sending to public analytics list plus people with whom we have talked about dashboard technologies in the past) Team: As you known we are building a dashboard to showcase editor engagement metrics and to explore replacement of our current dashboarding technology. We have spent time researching

Re: [Analytics] A follow-up question

2014-08-05 Thread Nuria Ruiz
Hackathon. My Hackathon wish is to duplicate and reapply what Nuria Ruiz and Andrew Otto has done for NARA analytics pilot. https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot So to your knowledge, is it feasible to do so, in terms of (a) setting up

[Analytics] Reportcard instructions

2014-08-05 Thread Nuria Ruiz
Team, I have updated the reportcard instructions on how to generate the reportcard from the files Erik Z sends. https://www.mediawiki.org/wiki/Analytics/ReportCard Thanks, ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] eventlogging largest tables

2014-10-02 Thread Nuria Ruiz
Should we be the ones taking care of it? I'm not sure that the DB credentials I currently have can delete content. Neither the ones we have. In the absence of a regular cleanup process (which is on our team to do) i think we just have to request Sean Pringle to delete the data. If anyone knows

[Analytics] Notes on EventLogging and SendBeacon meeting on 10/3

2014-10-03 Thread Nuria Ruiz
Please correct amend as needed: http://www.mediawiki.org/wiki/Extension:EventLogging/sendBeacon#Meeting_notes_for_10.2F3_meeting Thanks to everyone attending ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] eventlogging largest tables

2014-10-08 Thread Nuria Ruiz
We can automate purging using the MariaDB using the Event Scheduler[1] if you guys want a once-off-set-and-forget solution. Eg This sounds great for all the tables discussed on the thread. Is easy to add tables to that procedure? On Mon, Oct 6, 2014 at 8:22 AM, Sean Pringle

Re: [Analytics] Traffic device breakdown

2014-10-10 Thread Nuria Ruiz
At some point I believe we hope to just, you know. Have a regularly updated browser matrix somewhere. I REALLY think this should make it into our goals, if it cannot be done this quarter it should for sure be done this quarter. Do we not have more recent data than May? On Fri, Oct 10, 2014 at

Re: [Analytics] Traffic device breakdown

2014-10-10 Thread Nuria Ruiz
oke...@wikimedia.org wrote: On 10 October 2014 16:02, Nuria Ruiz nu...@wikimedia.org wrote: At some point I believe we hope to just, you know. Have a regularly updated browser matrix somewhere. I REALLY think this should make it into our goals, if it cannot be done this quarter it should

Re: [Analytics] Traffic device breakdown

2014-10-14 Thread Nuria Ruiz
to the newly updated version. On Fri, Oct 10, 2014 at 9:59 PM, Oliver Keyes oke...@wikimedia.org wrote: Woah! Nice :D How are definitions updates handled? On 10 October 2014 18:58, Nuria Ruiz nu...@wikimedia.org wrote: 1. A UDF for ua-parser or whatever we decide to use (this will possibly

Re: [Analytics] Traffic device breakdown

2014-10-17 Thread Nuria Ruiz
(with preliminary data) is that neither 2.1 nor 2.2 amount to 1% of traffic to the mobile site On Fri, Oct 17, 2014 at 2:37 PM, Christian Aistleitner christ...@quelltextlich.at wrote: Hi, [ leaving other things in this thread aside ] On Thu, Oct 16, 2014 at 07:15:03PM -0700, Nuria Ruiz wrote: iOS

Re: [Analytics] Desktop screen size/viewport heatmaps

2014-10-23 Thread Nuria Ruiz
The pngs do not render for me but have you seen so-called-treemap plots to represent screen size in the user base? They are very self descriptive. Here is a famous one for android devices and screen sizes, scroll down a bit for device fragmentation:

[Analytics] stat1002 log cleanup

2014-10-30 Thread Nuria Ruiz
Hello, To comply with our privacy policy we are going to purge logs in 1002 that are older than 90 days. Please let us know whether this is an issue. We hope to have these changes done by the end of next week. A concrete example: Logs in, for example, the eventlogging archiving directory:

Re: [Analytics] stat1002 log cleanup

2014-11-05 Thread Nuria Ruiz
there affect what logs we have stored in the DB? Is this an intermediate log storage place, a canonical one, etc.? What will we no longer be able to do after it is pruned? -Aaron On Thu, Oct 30, 2014 at 2:35 PM, Nuria Ruiz nu...@wikimedia.org wrote: Also, I'm not clear on the significance of the EL

Re: [Analytics] Analytics

2014-11-12 Thread Nuria Ruiz
), and whether the event validates against the schema. For the sample output you pasted earlier, or another sample output, can you let us know if validation section shows Valid? Leila On Mon, Nov 10, 2014 at 3:24 PM, Nuria Ruiz nu...@wikimedia.org wrote: Joel, For questions like these going forward

Re: [Analytics] Analytics

2014-11-13 Thread Nuria Ruiz
come in as of late, which could point to an issue on the setup. I will look into it some more. Thanks, Nuria On Wed, Nov 12, 2014 at 10:40 AM, Nuria Ruiz nu...@wikimedia.org wrote: To keep archives happy: Beta setup post events to http://bits.beta.wmflabs.org/event.gif http

Re: [Analytics] [LangEng] Analytics

2014-11-13 Thread Nuria Ruiz
Foundation jsahl...@wikimedia.org On Nov 13, 2014, at 9:42 AM, Nuria Ruiz nu...@wikimedia.org wrote: Hello, Taking last statement back, asked Yuvi and beta does have a varnish instance so the flow of EL events should be the same one that production. Now I looked on deployment-eventlogging02

Re: [Analytics] [LangEng] Analytics

2014-11-13 Thread Nuria Ruiz
...@wikimedia.org On Nov 13, 2014, at 9:42 AM, Nuria Ruiz nu...@wikimedia.org wrote: Hello, Taking last statement back, asked Yuvi and beta does have a varnish instance so the flow of EL events should be the same one that production. Now I looked on deployment-eventlogging02, which is the EL

Re: [Analytics] [LangEng] Analytics

2014-11-14 Thread Nuria Ruiz
the issue, and the fix is waiting approval from ops. Let's touch-base tomorrow to see if we see events. Leila On Thu, Nov 13, 2014 at 1:30 PM, Nuria Ruiz nu...@wikimedia.org wrote: Joel: I see, I was hoping to set aside the beta issues but if you are not deploying to prod any time soon I guess

Re: [Analytics] [LangEng] Analytics

2014-11-17 Thread Nuria Ruiz
be appreciated (maybe get the data in a way we could use some quick d3-based tool http://code.shutterstock.com/rickshaw/?). Thanks Pau On Mon, Nov 17, 2014 at 8:38 AM, Joel Sahleen jsahl...@wikimedia.org wrote: On Nov 17, 2014, at 9:13 AM, Nuria Ruiz nu...@wikimedia.org wrote: Since event

Re: [Analytics] [LangEng] Analytics

2014-11-17 Thread Nuria Ruiz
it was not possible to get things ready in advance. I find this approach could be problematic, but I'm happy to follow the Analytics advice on this. In any case, as said before, this is worth checking with product. Pau On Mon, Nov 17, 2014 at 12:17 PM, Nuria Ruiz nu...@wikimedia.org wrote: Joel, Please

Re: [Analytics] EventLogging data QA

2014-12-11 Thread Nuria Ruiz
Team: Besides the ability of testing in beta labs and the monitoring that ori highlited the incoming raw stream of events is available in 1003/1002 on port 8600. From 1002 or 1003 you can run: zsub vanadium.eqiad.wmnet:8600 and see the incoming stream. I am not sure that something beyond that

Re: [Analytics] EventLogging data QA

2014-12-15 Thread Nuria Ruiz
But I see that meanwhile a Phabricator task got added, and I guess I am alone with my judgement :-) Actually, I fully agree with you than no more infrastructure in this regard is needed and I think we were a little fast filing tasks here. I really think that every time we find ourselves testing in

Re: [Analytics] Switching the RD team to Phabricator

2014-12-15 Thread Nuria Ruiz
Also keeping two systems active could lead to requests going into two places Yes, this will certainly happen. On Mon, Dec 15, 2014 at 10:53 AM, Grace Gellerman ggeller...@wikimedia.org wrote: Should we talk more about this in our Research staff meeting on Tuesday? I agree that we need to

Re: [Analytics] EventLogging data QA

2014-12-15 Thread Nuria Ruiz
QA in beta labs is good but not enough. We still need to do QA when a feature goes to production and currently This is true but at the same time, I do not see anything in the description of your FF events that could not be tested on beta-labs. If we are talking add-block that can be tested even

[Analytics] Oozie 101 doc

2014-12-19 Thread Nuria Ruiz
(sending to public list) I have started a doc in wikitech that describes an oozie 101 example and goes a little into how to troubleshoot oozie jobs. Still WIP. Will update as work progresses: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie Please edit/correct as needed.

Re: [Analytics] analytics-store replag s1 and s5

2014-12-22 Thread Nuria Ruiz
Adding mobile tech so they are aware, I am guessing we need to query for that data in a more efficient fashion. On Mon, Dec 22, 2014 at 4:10 AM, Sean Pringle sprin...@wikimedia.org wrote: Had to kill queries, lest analytics-store grind to a halt and take even longer to recover. These ones:

Re: [Analytics] Older open tickets in Phabricator with Unbreak now! priority

2014-12-29 Thread Nuria Ruiz
As Kevin is on vacation I have lower priority to Normal for the task we are not working on in the immediate future but left the other two at highest. Note that while those tickets do not have updates related tickets do so. The updates are visible going through the blocked by section. Thanks,

Re: [Analytics] Performance Visualization Frontend

2014-12-30 Thread Nuria Ruiz
Hello, The more important question is where will your data come from: event logging? graphite? elsewhere? visualization comes secondary to this. EventLogging is a good solution for structured, somewhat complex, application data, graphite is s good solution for plain counters, which is well

[Analytics] Hive operator precedence

2015-01-30 Thread Nuria Ruiz
Team, Christian just let me know about the operator precedence in hive. Everyone writing queries should read about this as precedence it's not what you might expect and you query might end up taking fo eve making other users unhappy.

Re: [Analytics] Virtual file view hack for Media Viewer views

2015-02-05 Thread Nuria Ruiz
. These will be collected and dumped separately, as per https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts . Erik From: analytics-boun...@lists.wikimedia.org [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday

[Analytics] EL dropping events for about 8 hours from midnight Feb 5th to about 8am Feb 5th

2015-02-05 Thread Nuria Ruiz
Team: EL has dropped events for about 8 hours last night. The analytics team shall work on backfilling that data. Here is the backlog item associated to that task: https://phabricator.wikimedia.org/T88692 Thanks, Nuria ___ Analytics mailing list

Re: [Analytics] Virtual file view hack for Media Viewer views

2015-02-05 Thread Nuria Ruiz
/Media_file_request_counts . Erik From: analytics-boun...@lists.wikimedia.org [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Nuria Ruiz Sent: Wednesday, February 04, 2015 22:28 To: A mailing list for the Analytics Team at WMF and everybody who

Re: [Analytics] DNT, standards, and expectations [was: Re: [Wiki-research-l] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal]

2015-01-14 Thread Nuria Ruiz
For example, not collecting usage data about certain sections of our population (e.g. IE10 users where DNT is set by default) means that we don't know if our software works for them. This isn't free, and in the long-term, it can have substantial negative effects. If DNT was always disabled by

Re: [Analytics] DNT, standards, and expectations [was: Re: [Wiki-research-l] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal]

2015-01-14 Thread Nuria Ruiz
that there is a big detachment between user expectations of DNT and what the protocol actually does, and so we should probably avoid treating that protocol as a flag. On 14 January 2015 at 13:45, Nuria Ruiz nu...@wikimedia.org wrote: For example, not collecting usage data about certain sections

Re: [Analytics] DNT, standards, and expectations

2015-01-16 Thread Nuria Ruiz
What I find concerning is the idea that a biased subset of our users would be categorically ignored for this type of evaluation. If you agree with me that such evaluation is valuable to our users, I think you ought to also find such categorical exclusions concerning. Dan has mentioned a possible

Re: [Analytics] DNT, standards, and expectations

2015-01-16 Thread Nuria Ruiz
in detail behavior of users that use, say, opera mini (made up example) On Fri, Jan 16, 2015 at 9:10 AM, Nuria Ruiz nu...@wikimedia.org wrote: What I find concerning is the idea that a biased subset of our users would be categorically ignored for this type of evaluation. If you agree with me

Re: [Analytics] eventlogging master

2015-02-16 Thread Nuria Ruiz
For switchover of writes, we'll need to coordinate an EL consumer restart to use a new CNAME of m4-master.eqiad.wmnet This is configuration change on the EL config plus a small downtime and a re-start (easy). I am not sure how user /passwords are setup on the config so cc-ing otto to keep him in

Re: [Analytics] Frame Timing API

2015-02-16 Thread Nuria Ruiz
Hello, My 2 cents: Tracking scrolling issues (jank) down is not easily done and in that case the API seems that it might actually help you quantify the performance gains/losses from making the scrolling experience smoother across your user base (just an example). Still, it seems a pretty low

Re: [Analytics] Rough estimate of percentage of requests without Javascript enabled/capable clients

2015-02-18 Thread Nuria Ruiz
UA detection precision in general. Do you think it's worth getting the UA distribution for CSS requests correlate it with the distribution for page / JS loading? Gabriel On Wed, Feb 18, 2015 at 7:17 AM, Nuria Ruiz nu...@wikimedia.org wrote: Sorry I forgot to address this earlier: Do you

Re: [Analytics] Rough estimate of percentage of requests without Javascript enabled/capable clients

2015-02-16 Thread Nuria Ruiz
16, 2015 at 6:38 PM, Nuria Ruiz nu...@wikimedia.org wrote: Gabriel: I have run through the data and have a rough estimate of how many of our pageviews are requested from browsers w/o strong javascript support. It is a preliminary rough estimate but I think is pretty useful. TL;DR According

Re: [Analytics] Beta Labs EventLogging logs

2015-01-11 Thread Nuria Ruiz
. Kaldari On Wed, Jan 7, 2015 at 1:27 PM, Ryan Kaldari rkald...@wikimedia.org wrote: Ah, sorry, I was looking on the wrong server (deployment-bastion). Thanks! On Wed, Jan 7, 2015 at 1:21 PM, Nuria Ruiz nu...@wikimedia.org wrote: Ahem they are there: nuria@deployment-eventlogging02:/var/log

Re: [Analytics] Performance Visualization Frontend (Christy Okpo)

2015-01-11 Thread Nuria Ruiz
to then visualize the information? Message: 1 Date: Tue, 30 Dec 2014 07:37:35 -0800 From: Nuria Ruiz nu...@wikimedia.org To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. analytics@lists.wikimedia.org Subject: Re

Re: [Analytics] WikiGrok and EventLogging

2015-01-06 Thread Nuria Ruiz
(cc-ing mobile-tech) Since we do not the details of how wikigrok is used and its throughput of requests we can not estimate sampling ourselves. I imagine wikigrok is been deployed to a number of users and it is with that usage the mobile team could estimate the total throughput expected, with

Re: [Analytics] WikiGrok and EventLogging

2015-01-08 Thread Nuria Ruiz
WikiGrok to 10 out of every 62 users or ~16% (the userToken is a base 62 number). That should give us an estimated 27 hits per second. Does that work for everyone? Kaldari On Thu, Jan 8, 2015 at 2:06 PM, Nuria Ruiz nu...@wikimedia.org wrote: We cannot guarantee that with 60 events a sec things

Re: [Analytics] Pageviews update

2015-01-07 Thread Nuria Ruiz
I am not sure if this is quite what you are asking but just in case: For streaming is probably easier for you to use the newly created webrequest tables: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29 Those include an isPageview field so requests are

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Nuria Ruiz
Incident documentation updated: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150107-EventLogging On Wed, Jan 7, 2015 at 10:58 AM, Nuria Ruiz nu...@wikimedia.org wrote: Team: Issues on event logging have been solved, outage of client side events (did not affected server side

Re: [Analytics] Beta Labs EventLogging logs

2015-01-07 Thread Nuria Ruiz
Ahem they are there: nuria@deployment-eventlogging02:/var/log/upstart$ ls eventlogging_*log eventlogging_processor-client-side-events.log eventlogging_processor-server-side-events.log On Wed, Jan 7, 2015 at 12:57 PM, Ryan Kaldari rkald...@wikimedia.org wrote: It seems the EventLogging

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Nuria Ruiz
Kaldari: Expanding a bit to what Dan said: We took up EL from ori's basically 6 months ago. The operational support analytics provide is documented here: https://www.mediawiki.org/wiki/EventLogging/OperationalSupport EL has several parts and while we have not done much development on the mw

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Nuria Ruiz
0.45% (1.25/sec) MobileWikiAppSearch 0.41% (1.13/sec) CentralAuth 0.40% (1.12/sec) On Wed, Jan 7, 2015 at 5:12 PM, Nuria Ruiz nu...@wikimedia.org wrote: We're talking about a total of ~170 events per

Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Nuria Ruiz
, 2015 at 10:32 AM, Nuria Ruiz nu...@wikimedia.org wrote: I believe there is already an EL-Kafka pipeline and this would make it easy to integrate page views with our regular processing. Note that the pipeline was disabled 6 months ago and thus my comment in the near term https://github.com

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Nuria Ruiz
Team: Issues on event logging have been solved, outage of client side events (did not affected server side events) lasted about 12 hours. Please see: http://picpaste.com/Screen_Shot_2015-01-07_at_10.50.28_AM-NsMSPgHp.png Thanks, Nuria On Wed, Jan 7, 2015 at 3:57 AM, Christian Aistleitner

Re: [Analytics] Setting up eventlogging-devserver

2015-01-07 Thread Nuria Ruiz
Roxana: You are correct, the devserver is broken in vagrant at this time. However that doesn't mean you cannot instrument your code and see events on console. We shall try to have a patch for the devserver soon but, as I said, that should not block your development. Thanks, Nuria On Tue, Jan 6,

Re: [Analytics] eventlogging master

2015-03-16 Thread Nuria Ruiz
at 2:09 AM, Andre Klapper aklap...@wikimedia.org wrote: On Thu, 2015-02-26 at 11:25 +1000, Sean Pringle wrote: On Sun, Feb 22, 2015 at 1:20 PM, Nuria Ruiz nu...@wikimedia.org wrote: Coordination on Monday sounds good. Did you guys come to any conclusion about vanadium? Could someone

Re: [Analytics] eventlogging master

2015-03-16 Thread Nuria Ruiz
Ticket for box upgrade is here: https://phabricator.wikimedia.org/T90363 On Mon, Mar 16, 2015 at 10:04 AM, Nuria Ruiz nu...@wikimedia.org wrote: Did you guys come to any conclusion about vanadium? Sorry about missing this. Ori has requested two EL hosts, those were granted two weeks ago

[Analytics] [Technical] Eventlogging documentation on wikitech

2015-03-17 Thread Nuria Ruiz
Team: All work we have been doing thus far with EventLogging is documented in wikitech: *- General management of system (restarting, graphite, database)* https://wikitech.wikimedia.org/wiki/EventLogging *- Backfilling:* https://wikitech.wikimedia.org/wiki/EventLogging/Backfilling *- Beta labs

Re: [Analytics] [Technical] which pageview definition

2015-03-15 Thread Nuria Ruiz
What I would recommend is using the new data in wmf.webrequests, which gives you, as you say, about 2.5 months, and filtering the user agent; there are a couple of UDFs for user agent detection, including isSpider, which also looks for wikimedia-specific bots that ua-parser ignores. So you know

Re: [Analytics] MySQL binlog - JSON in Kafka

2015-03-16 Thread Nuria Ruiz
Indeed, we could use this one for a bunch of things. On Mon, Mar 16, 2015 at 7:25 AM, Andrew Otto ao...@wikimedia.org wrote: Whoa, kinda cool: https://github.com/pyr/sqlstream Maybe useful as a non-intrusive way of getting a change event stream out of Mediawiki without making application

Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Nuria Ruiz
Thanks much Christian for the writeup. Should have icinga alarms arround these types of issues? Seems like that would be the way to go. Thanks, Nuria On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto ao...@wikimedia.org wrote: Thanks Christian! On Mar 7, 2015, at 09:14, Christian Aistleitner

Re: [Analytics] Partial outage of Event Logging on March 20th

2015-03-25 Thread Nuria Ruiz
Issues were resolved promptly and analytics team shall backfill client side events that were dropped on the 20th as a result of the outage. This work is now completed. On Mon, Mar 23, 2015 at 10:16 AM, Nuria Ruiz nu...@wikimedia.org wrote: Hello, Eventlogging had some issues on March 20th

Re: [Analytics] [Discussion] User agent data releases

2015-03-03 Thread Nuria Ruiz
Erik has asked me to write an exploratory app for user-agent data. The idea is to enable Product Managers and engineers to easily explore what users use so they know what to support. I've thrown up an example screenshot at http://ironholds.org/agents_example_screen.png I cannot speak as to the

Re: [Analytics] Rough estimate of percentage of requests without Javascript enabled/capable clients

2015-03-01 Thread Nuria Ruiz
there's a fresh start with caching. /braindump — Timo On 18 Feb 2015, at 18:07, Nuria Ruiz nu...@wikimedia.org wrote: Do you think it's worth getting the UA distribution for CSS requests correlate it with the distribution for page / JS loading? Yes, we can do that. I would need to gather

Re: [Analytics] Rough estimate of percentage of requests without Javascript enabled/capable clients

2015-03-01 Thread Nuria Ruiz
Note that couple days worth of traffic might be more than a 1 billion requests for javascript on bits. Sorry, correction. Couple days worth of javascript bits requests comes up to 100 million requests not a 1000 million. On Sun, Mar 1, 2015 at 4:35 PM, Nuria Ruiz nu...@wikimedia.org wrote

Re: [Analytics] [Technical][Debate] Historical client ip and geocoded data

2015-02-23 Thread Nuria Ruiz
If I remember correctly, Chris had the maxmind db on github with a script that update it and commit changes. Thus making possible to play back time and get the state of the db how it was when than data was calculated. I think Dan has that script cron running in his homedir, if we could

Re: [Analytics] Provenance Params

2015-02-23 Thread Nuria Ruiz
I favor a URL solution cause I think is easier to parse and maintain. https://en.wikipedia.org/wiki/ref=app/Barack_Obama; I also think the supported set of reftags should be very short, for example: https://aws.amazon.com/marketplace/help/201349870 Note that Varnish supports url rewriting:

Re: [Analytics] [Product] Testing the new Pageviews implementations

2015-02-23 Thread Nuria Ruiz
​Aha, so if we never hit the read-mode Varnishes we can ignore anything about this? Great.​ The answer .. ahem .. would be no. Not really. But you knew that probably. I think James has a point in saying that is not so easy to see what might affect requests, I certainly agree given the e-mails I

Re: [Analytics] [Technical] eventlogging master

2015-02-25 Thread Nuria Ruiz
CC-ing Ori. He mentioned he was given a box today but no further details. Thanks, Nuria On Wed, Feb 25, 2015 at 5:25 PM, Sean Pringle sprin...@wikimedia.org wrote: On Sun, Feb 22, 2015 at 1:20 PM, Nuria Ruiz nu...@wikimedia.org wrote: Coordination on Monday sounds good. Did you guys come

Re: [Analytics] Provenance Params

2015-02-24 Thread Nuria Ruiz
If there’s no other objection, we can safely fold this under the discussion of long-term options and go ahead with the proposed implementation, per Dan. I think there are some technical issues to be ironed right? 1. How are we doing so a request like:

Re: [Analytics] Virtual file view hack for Media Viewer views

2015-03-19 Thread Nuria Ruiz
whether the hits to the beacon URI are picked up by varnishkafka or not at the moment, since he set up the endpoint. On Wed, Mar 18, 2015 at 3:42 PM, Nuria Ruiz nu...@wikimedia.org wrote: Gilles: And we know this data is coming via varnishkafka into the cluster, right? Did we checked

Re: [Analytics] Something's up with EventLogging since Jan 7th

2015-01-29 Thread Nuria Ruiz
look like everything is making it to the DB, I'll keep investigating tomorrow. On Wed, Jan 28, 2015 at 5:43 PM, Nuria Ruiz nu...@wikimedia.org wrote: Gilles: This event has a pretty constant rate of input: http://graphite.wikimedia.org/render/?width=588height=311_salt=1422494956.516from=00

[Analytics] Partial outage of Event Logging on March 20th

2015-03-23 Thread Nuria Ruiz
Hello, Eventlogging had some issues on March 20th due to an inflow of client side events higher than the system can support. Inflow was due to the new instrumentation deployed for Wikitext to be able to compare Wikitext usage with Visual editor usage. Issues were resolved promptly and analytics

Re: [Analytics] measuring traffic on mobile web beta

2015-04-02 Thread Nuria Ruiz
Sorry, this should be: Mobile web beta does not have any special url. It is triggered by a cookie. If the COOKIE that identifies 'mobile-web-beta' is stripped off in varnish (something you can ask your devs about)... On Thu, Apr 2, 2015 at 5:05 PM, Nuria Ruiz nu...@wikimedia.org wrote: (cc-ing

Re: [Analytics] Task for your attention - update to app uniques and session reports

2015-04-22 Thread Nuria Ruiz
Please cc analytics@ so the whole team sees this requests. On Wed, Apr 22, 2015 at 3:09 PM, Dan Garry dga...@wikimedia.org wrote: Hey Kevin, Task for your attention: T96926 https://phabricator.wikimedia.org/T96926 The following patches are ready to be merged in the iOS and Android apps

Re: [Analytics] [Technical] X-analytics header mobile apps items

2015-04-21 Thread Nuria Ruiz
Wednesday to write this up. -Adam On Mon, Apr 20, 2015 at 8:14 PM, Nuria Ruiz nu...@wikimedia.org wrote: Ping ... On Fri, Apr 17, 2015 at 7:45 AM, Adam Baso ab...@wikimedia.org wrote: Sure thing. Dan and Bernd I'll sync up with you on this. On Fri, Apr

Re: [Analytics] [Technical] Strange behavior of EL m4-master

2015-04-15 Thread Nuria Ruiz
This sounds like the fixes we did last quarter to the batch insertion basically hid the problem instead of making it go away. I think we are mixing things here, when we had issues with batching code we never saw a pattern of no-events-whatsoever-in-any-table for an hour. We saw events dropped in

Re: [Analytics] measuring traffic on mobile web beta

2015-04-15 Thread Nuria Ruiz
Some things to have in mind: 1) Bots AND user_agent_map['device_family'] Spider Doesn't remove all bots, only very prominent ones, so stats still include traffic from say, wmf robots, for example. 2) Sampling: Strangely, the event-logs for specific actions showed much higher traffic for

Re: [Analytics] [Technical] Strange behavior of EL m4-master

2015-04-15 Thread Nuria Ruiz
Given that batching code is been deployed since earlier (March16th) than the 1st event listed by Marcel (April 9th) and since then we have swapped the EL box (April 3rd/4th) we probably want to look at system issues. On my opinion it is probably easier to see with tcpdump whether inserts are

Re: [Analytics] Fwd: [Web] MediaWiki: MobileFrontend dashboard

2015-04-14 Thread Nuria Ruiz
Anyone know what powers, or more correctly what *should* power, the MediaWiki: MobileFrontend dashboard [0]? I'm hoping that it's data from the NavigationTiming extension but I've been known to be wrong. I do not think sot. You can see is data reported to graphite if you look at the network

Re: [Analytics] Wikipedia Page View Stats

2015-10-23 Thread Nuria Ruiz
>Are there any related Phabricator task IDs to share? Any analytics task marked as {slug}. For example: https://phabricator.wikimedia.org/search/query/lVxHj15dmctY/ On Fri, Oct 23, 2015 at 10:28 AM, Andre Klapper wrote: > On Fri, 2015-10-23 at 09:01 -0400, Dan Andreescu

Re: [Analytics] noccokie tag on X-analytics

2015-10-21 Thread Nuria Ruiz
y. > > On Wed, Oct 21, 2015 at 1:49 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: > >> >What was the motivation for this change? Just looking for possible >> automata? >> Right.The motivation was to see if the absence of cookies works as a >> cheap

[Analytics] noccokie tag on X-analytics

2015-10-21 Thread Nuria Ruiz
Team: As of today incoming request data includes an extra bit of information on the X-analytics header. If an incoming request to any wikipedia project had no cookies whatsoever it will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user

[Analytics] Developer summit session - Pageview API

2015-10-28 Thread Nuria Ruiz
Hello! The analytics team is planning to give a presentation about the Pageview API we are working on on the developer summit (we are hoping to announce the API pretty soon) Please feel free to add to the ticket use cases you would like to talk about regarding pageView API or any discussion

[Analytics] Transitioning wikistats pageview reports to use new pageview definition

2015-11-10 Thread Nuria Ruiz
Hello! The analytics team wishes to announce that we have finally transitioned several of the pageview reports in stats.wikimedia.org to the new pageview definition [1]. This means that we should no longer have two conflicting sources of pageview numbers. While we are not not fully done

  1   2   3   4   >