>I'm not sure why a beacon would have to be a dummy html file, thus
confusing PV stats.

>Could it not be a dummy image request, more in line with the one pixel
images that are often used.


Right. I agree a dummy image makes more sense.

On Thu, Feb 5, 2015 at 2:28 PM, Erik Zachte <ezac...@wikimedia.org> wrote:

> I'm not sure why a beacon would have to be a dummy html file, thus
> confusing PV stats.
>
> Could it not be a dummy image request, more in line with the one pixel
> images that are often used.
>
> This way Oliver can relax, go on vacation for real, without keeping a
> close watch over PV definitions.
>
>
>
> *From:* analytics-boun...@lists.wikimedia.org [mailto:
> analytics-boun...@lists.wikimedia.org] *On Behalf Of *Dan Andreescu
> *Sent:* Thursday, February 05, 2015 22:43
>
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] Virtual file view hack for Media Viewer views
>
>
>
> Nuria & Erik: you're totally right, I keep forgetting this problem is more
> complicated than I think.
>
>
>
> So we should figure out how this statsv magic thing works and see if we
> can use it here.
>
>
>
> On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz <nu...@wikimedia.org> wrote:
>
> >[Oliver] My point was more that we should try to avoid traffic-generating
>
> >[Oliver] requests that exist solely as a hack for analytics purposes;
>
> >[Dan] Is this a potential solution to Oliver's concern:
>
>
>
> I disagree we should be concern about "beacons" to identify preloads, just
> like beacons exist for ads or stats using one to identify preloads doesn't
> seem far fetched (certainly I have used similar code before and it did its
> job).
>
>
>
> Note that EL works in a similar fashion requesting a "fake" image to
> varnish to which we answer with a 204. It is very similar and the reason
> why we have such a code is that we do not have a specific endpoint or
> domain where requests of this type could go. Everything requested by our
> users and ourselves ends up in varnish pretty much.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <dandree...@wikimedia.org>
> wrote:
>
> Is this a potential solution to Oliver's concern:
>
>
>
> For "real" image views, add an X-Analytics header value of
> "real-view=true" to the request itself?
>
>
>
> If that's not feasible, we should look into using statsv for this (not
> sure how that works) or having this be a different kafka topic and not
> consumed into HDFS.
>
>
>
> On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <tneg...@wikimedia.org>
> wrote:
>
> I created a card -- modify as desired:
>
>
>
> https://trello.com/c/HMgVD4mz
>
>
>
> -Toby
>
>
>
> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tneg...@wikimedia.org> wrote:
>
> It turns out that the media viewer (on desktop; don't know about mobile)
> does a lot of caching so just because an image is loaded from swift, it
> doesn't mean it is viewed. We'd like to provide more accurate stats to the
> GLAM folks, so yes, I think this needs to be added eventually. Let's leave
> it out of scope for now.
>
>
>
> -Toby
>
>
>
> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <oke...@wikimedia.org> wrote:
>
> We want to include these files in the pageview definition? :/.
>
> My point was more that we should try to avoid traffic-generating
> requests that exist solely as a hack for analytics purposes; it's
> artificial work for both users and us. If this is the only way of
> doing things that's totally fine.
>
>
> On 5 February 2015 at 11:38, Toby Negrin <tneg...@wikimedia.org> wrote:
> > Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop based
> > solution would be basically doing the same thing as you propose.
> >
> > Can you please run it past ops (especially the 404 v 204) part?
> >
> > Oliver -- the issue is that we'd like to figure out a way to provide
> > accurate views of the media files; because of client side caching, we
> can't
> > use the current requests. But your point is a good one -- we'll need to
> add
> > this to the PV definition.
> >
> > -Toby
> >
> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <oke...@wikimedia.org>
> wrote:
> >>
> >> A nice theory, but if they appear in the webrequest table (presumably
> >> they would, and we're not creating an entirely new set of varnishes
> >> for the transmission of dummy images?) they have to be factored in.
> >> Again, however, the new definition automatically filters them by
> >> checking the webrequest source and MIME type, so this is not a
> >> problem, as I originally stated.
> >>
> >> On 5 February 2015 at 08:10, Erik Zachte <ezac...@wikimedia.org> wrote:
> >> > Oliver, this is not about pageviews, but about media file views.
> >> >
> >> >
> >> >
> >> > These will be collected and dumped separately, as per
> >> >
> >> >
> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
> >> > .
> >> >
> >> >
> >> >
> >> > Erik
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > From: analytics-boun...@lists.wikimedia.org
> >> > [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Nuria
> Ruiz
> >> > Sent: Wednesday, February 04, 2015 22:28
> >> > To: A mailing list for the Analytics Team at WMF and everybody who has
> >> > an
> >> > interest in Wikipedia and analytics.
> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
> >> >
> >> >
> >> >
> >> >>We would add a rule to Vagrant to make sure it does not try to look up
> >> >> such
> >> >> requests in Swift but returns a 404 immediately.
> >> >
> >> > I bet ops would like it a lot better if this is a 204 and it kind of
> >> > makes
> >> > sense as it is the code used for beacons and such. Otherwise they
> might
> >> > get
> >> > alarms on 404s increasing.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <oke...@wikimedia.org>
> >> > wrote:
> >> >
> >> > Not really; the new pageviews definition wouldn't include those files
> >> > anyway. It seems silly, thought, be deliberately generating a large
> >> > amount of automated noise and client requests for this :/.
> >> >
> >> >
> >> > On 4 February 2015 at 15:00, Gergo Tisza <gti...@wikimedia.org>
> wrote:
> >> >> Hi all,
> >> >>
> >> >> Erik Zachte is working on file view stats and is looking for a way to
> >> >> track
> >> >> Media Viewer image views (for which there is no 1:1 relation between
> >> >> server
> >> >> hits and actual image views); after some back and forth in
> >> >> https://phabricator.wikimedia.org/T86914 I proposed the following
> hack:
> >> >>
> >> >> whenever the javascript code in MediaViewer determines that an image
> >> >> view
> >> >> happened (e.g. an image has been displayed for a certain amount of
> >> >> time),
> >> >> it
> >> >> makes a request to a certain fake image, say
> >> >>
> >> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
> <real
> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be easily
> >> >> filtered from the varnish request logs and added to the normal
> >> >> requests.
> >> >> We
> >> >> would add a rule to Vagrant to make sure it does not try to look up
> >> >> such
> >> >> requests in Swift but returns a 404 immediately.
> >> >>
> >> >> This would be a temporary workaround until there is a proper way to
> log
> >> >> virtual image views, such as EventLogging with a non-SQL backend.
> >> >>
> >> >> Do you see any fundamental problem with this?
> >> >>
> >> >
> >> >> _______________________________________________
> >> >> Analytics mailing list
> >> >> Analytics@lists.wikimedia.org
> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Oliver Keyes
> >> > Research Analyst
> >> > Wikimedia Foundation
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > Analytics@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > Analytics@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to