Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Andrew Otto
Can we keep further discussion on the phablet thread? ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Nuria Ruiz
>Regarding the last few posts about the geolocation information, from the data analysis perspective, there is indeed another, more serious concern about using the GeoIP cookie: >It will create significant discrepancies with the existing geolocation data we record for pageviews, where we have

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Andrew Otto
Gonna paste your reply on the ticket and respond there. On Wed, Feb 7, 2018 at 1:29 PM, Tilman Bayer wrote: > On Wed, Feb 7, 2018 at 9:19 AM, Andrew Otto wrote: > >> It will create significant discrepancies

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Tilman Bayer
On Wed, Feb 7, 2018 at 9:19 AM, Andrew Otto wrote: >> It will create significant discrepancies with the existing geolocation >> data we record for pageviews > If you only need country (or whatever is in the cookie), then likely > whatever the output dataset is would only

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Andrew Otto
> It will create significant discrepancies with the existing geolocation data we record for pageviews If you only need country (or whatever is in the cookie), then likely whatever the output dataset is would only include country when selecting from pageviews. If you need more than country (it

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Tilman Bayer
Thanks everyone! Separate from Sam's mapping out the frontend instrumentation work at https://phabricator.wikimedia.org/T184793 , I have created a task for the backend work at https://phabricator.wikimedia.org/T186728 based on this thread. Regarding the last few posts about the geolocation

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-07 Thread Sam Smith
Just a quick update: I've captured details from this discussion and the background in https://phabricator.wikimedia.org/T184793. I'd sure appreciate your feedback. -Sam ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-01 Thread Nuria Ruiz
>Wow Sam, yeah, if this cookie works for you, it will make many things much easier for us This is how it is done on performance schemas for Navigation timing data per country, so there is a precedence.

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-01 Thread Andrew Otto
Wow Sam, yeah, if this cookie works for you, it will make many things much easier for us. Check it out and let us know. If it doesn’t work for some reason, we can figure out the backend geocoding part. On Thu, Feb 1, 2018 at 2:43 AM, Sam Smith wrote: > On Tue, Jan

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-02-01 Thread Sam Smith
On Tue, Jan 30, 2018 at 8:02 AM, Andrew Otto wrote: > > Using the GeoIP cookie will require reconfiguring the EventLogging > varnishkafka instance [0] > > I’m not familiar with this cookie, but, if we used it, I thought it would > be sent back to by the client in the event.

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-30 Thread Nuria Ruiz
>I’m not totally sure if this works for you all, but I had pictured generating aggregates from the page preview events, and then joining the page preview aggregates with the >pageview aggregates into a new table with an extra dimension specifying which type of content view was made. On my opinion

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-30 Thread Andrew Otto
CoOOOl :) > Using the GeoIP cookie will require reconfiguring the EventLogging varnishkafka instance [0] I’m not familiar with this cookie, but, if we used it, I thought it would be sent back to by the client in the event. E.g. event.country = response.headers.country; EventLogging.emit(event);

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Adam Baso
Thanks. On Fri, Jan 19, 2018 at 12:30 PM, Nuria Ruiz wrote: > >Thanks, good to know - is there a report around that? I'm wondering how > "missing requests" ought to be expressed with some margin of error. > I think the ones that can quantify this best is your team. If

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Nuria Ruiz
>Thanks, good to know - is there a report around that? I'm wondering how "missing requests" ought to be expressed with some margin of error. I think the ones that can quantify this best is your team. If anything from what I remember from pop ups experiments the inflow of events was higher than

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Nuria Ruiz
>So maybe it's worth considering which approach takes us closer to that? AIUI the beacon puts the record into the webrequest table and from there it would only take some >trivial preprocessing to replace the beacon URL with the virtual URL and and add the beacon type as a "virtual_type" field or

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Adam Baso
> >Thanks, Sam. Nuria, that's what I was getting at - if using the EL JS > library would some sort of new method be needed so that these impressions > arena't undercounted? > If we had a lot of users with DNT, maybe, from our tests when we enabled > that on EL this is not the case. > Thanks, good

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Andrew Otto
> You could join these together in a broader ‘content consumption’ dataset somehow, either in Hadoop with batch jobs, or more realtime with streaming jobs. Hm, idea…which I think has been mentioned before: Could we leave pageviews as is, but make a new dataset that counts both pageviews and page

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Nuria Ruiz
>Thanks, Sam. Nuria, that's what I was getting at - if using the EL JS library would some sort of new method be needed so that these impressions arena't undercounted? If we had a lot of users with DNT, maybe, from our tests when we enabled that on EL this is not the case. Your team has already run

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Andrew Otto
> For virtual pageviews, people will probably be more interested in reports that belong to the first group (summing them up with normal pageviews, breaking them down along the dimensions that are relevant for web traffic, counting them for a given URL etc). Ah! Ok I get this use case now. I

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Adam Baso
Thanks, Sam. Nuria, that's what I was getting at - if using the EL JS library would some sort of new method be needed so that these impressions arena't undercounted? On Fri, Jan 19, 2018 at 4:49 AM, Sam Smith wrote: > On Thu, Jan 18, 2018 at 9:57 PM, Adam Baso

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Sam Smith
On Thu, Jan 18, 2018 at 9:57 PM, Adam Baso wrote: > Adding to this, one thing to consider is DNT - is there a way to invoke EL > so that such traffic is appropriately imputed or something? > The EventLogging client respects DNT [0]. When the user enables DNT,

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-19 Thread Gergo Tisza
On Thu, Jan 18, 2018 at 3:56 PM, Nuria Ruiz wrote: > Event logging use cases are events, as we move to a thicker client -more > javascript heavy- you will be needing to measure events for -nearly- > everything, whether those are to be consider "content consumption" or "ui >

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Nuria Ruiz
> I don't see how this addresses Gergo's larger point about the difference between consistently tallying content consumption (pageviews, previews, mediaviewer image views) >and analyzing UI interactions (which is the main use case that EventLogging has been developed and used for). Event logging

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Andrew Otto
> Are you saying that the server load generated by such an additional aggregation query would be a blocker? If yes, how about we combine the two (for pageviews and previews) into one? Sorry, no it isn’t a blocker. The tagging logic that Nuria and others have been working on for a while now

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Tilman Bayer
On Thu, Jan 18, 2018 at 10:45 AM, Andrew Otto wrote: > > the beacon puts the record into the webrequest table and from there it > would only take some trivial preprocessing > ‘Trivial’ preprocessing that has to look through 150K requests per second! > This is a lot of work! >

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Andrew Otto
> For example, UI instrumentations on the web are almost always sampled, because that yields enough data to answer UI questions - but on the other hand tend to record much more detail about the individual interaction. In contrast, we register all pageviews unsampled, but don't keep a permanent

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Tilman Bayer
On Thu, Jan 18, 2018 at 8:16 AM, Nuria Ruiz wrote: > Gergo, > > >while EventLogging data gets stored in a different, unrelated way > Not really, This has changed quite a bit as of the last two quarters. > Eventlogging data as of recent gets preprocessed and refined similar

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Nuria Ruiz
>Adding to this, one thing to consider is DNT - is there a way to invoke EL so that such traffic is appropriately imputed or something? I am not sure what you are asking ... On Thu, Jan 18, 2018 at 1:57 PM, Adam Baso wrote: > (I'd defer to the Readers Web team with Tilman

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Andrew Otto
> In particular, will we be able to sort by country, OS, Browser, etc? OS, Browser, yes. User Agent parsing is done by the EventLogging processors. Country not quite as easily, as EventLogging does not include client IP addresses. We could consider putting this back in somehow, or, I’ve also

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Olga Vasileva
Hi all, I just want to confirm that the proposed method using Eventlogging will allow us to gather data in a similar fashion to the web request table. In particular, will we be able to sort by country, OS, Browser, etc? Our goal here is to be able to consider the new page interactions metric on

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Andrew Otto
> the beacon puts the record into the webrequest table and from there it would only take some trivial preprocessing ‘Trivial’ preprocessing that has to look through 150K requests per second! This is a lot of work! > tracking of events is better done on an event based system and EL is such a

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Nuria Ruiz
Gergo, >while EventLogging data gets stored in a different, unrelated way Not really, This has changed quite a bit as of the last two quarters. Eventlogging data as of recent gets preprocessed and refined similar to how webrequest data is preprocessed and refined. You can have a dashboard on top

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-18 Thread Sam Smith
On Wed, Jan 17, 2018 at 6:46 PM, Leila Zia wrote: > On Wed, Jan 17, 2018 at 1:51 AM, Sam Smith wrote: > > > IMO #1 is preferable from the operations and performance perspectives as > the > > response is always served from the edge and includes very

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

2018-01-17 Thread Gergo Tisza
On Wed, Jan 17, 2018 at 10:54 AM, Nuria Ruiz wrote: > Recording "preview_events" is really no different that recording any other > kind of UI event, difference is going to come from scale if anything, as > they are probably tens of thousands of those per second (I think your