>I have to admit that I haven't read all of this rather lengthy thread, but
why wouldn't we just track this with EventLogging?
I think a good usage of event logging is tracking "events",  not pageviews.
We do not need a capsule+ schema+ validation system to be able to count
pageviews. Plain requests would work fine, is a lot simpler use case.


On Thu, Feb 5, 2015 at 3:16 PM, Oliver Keyes <oke...@wikimedia.org> wrote:

> Bandwidth, I imagine? 25M events is a lot of events on top of the
> existing throughput.
>
> On 5 February 2015 at 18:13, Ryan Kaldari <rkald...@wikimedia.org> wrote:
> > I have to admit that I haven't read all of this rather lengthy thread,
> but
> > why wouldn't we just track this with EventLogging? That would avoid all
> the
> > pitfalls of other possible solutions: dealing with caching, creating
> bogus
> > extra file requests, etc.
> >
> > On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tneg...@wikimedia.org>
> wrote:
> >>
> >> It turns out that the media viewer (on desktop; don't know about mobile)
> >> does a lot of caching so just because an image is loaded from swift, it
> >> doesn't mean it is viewed. We'd like to provide more accurate stats to
> the
> >> GLAM folks, so yes, I think this needs to be added eventually. Let's
> leave
> >> it out of scope for now.
> >>
> >> -Toby
> >>
> >> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <oke...@wikimedia.org>
> wrote:
> >>>
> >>> We want to include these files in the pageview definition? :/.
> >>>
> >>> My point was more that we should try to avoid traffic-generating
> >>> requests that exist solely as a hack for analytics purposes; it's
> >>> artificial work for both users and us. If this is the only way of
> >>> doing things that's totally fine.
> >>>
> >>> On 5 February 2015 at 11:38, Toby Negrin <tneg...@wikimedia.org>
> wrote:
> >>> > Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop
> based
> >>> > solution would be basically doing the same thing as you propose.
> >>> >
> >>> > Can you please run it past ops (especially the 404 v 204) part?
> >>> >
> >>> > Oliver -- the issue is that we'd like to figure out a way to provide
> >>> > accurate views of the media files; because of client side caching, we
> >>> > can't
> >>> > use the current requests. But your point is a good one -- we'll need
> to
> >>> > add
> >>> > this to the PV definition.
> >>> >
> >>> > -Toby
> >>> >
> >>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <oke...@wikimedia.org>
> >>> > wrote:
> >>> >>
> >>> >> A nice theory, but if they appear in the webrequest table
> (presumably
> >>> >> they would, and we're not creating an entirely new set of varnishes
> >>> >> for the transmission of dummy images?) they have to be factored in.
> >>> >> Again, however, the new definition automatically filters them by
> >>> >> checking the webrequest source and MIME type, so this is not a
> >>> >> problem, as I originally stated.
> >>> >>
> >>> >> On 5 February 2015 at 08:10, Erik Zachte <ezac...@wikimedia.org>
> >>> >> wrote:
> >>> >> > Oliver, this is not about pageviews, but about media file views.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > These will be collected and dumped separately, as per
> >>> >> >
> >>> >> >
> >>> >> >
> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
> >>> >> > .
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > Erik
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > From: analytics-boun...@lists.wikimedia.org
> >>> >> > [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Nuria
> >>> >> > Ruiz
> >>> >> > Sent: Wednesday, February 04, 2015 22:28
> >>> >> > To: A mailing list for the Analytics Team at WMF and everybody who
> >>> >> > has
> >>> >> > an
> >>> >> > interest in Wikipedia and analytics.
> >>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
> >>> >> > views
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >>We would add a rule to Vagrant to make sure it does not try to
> look
> >>> >> >> up
> >>> >> >> such
> >>> >> >> requests in Swift but returns a 404 immediately.
> >>> >> >
> >>> >> > I bet ops would like it a lot better if this is a 204 and it kind
> of
> >>> >> > makes
> >>> >> > sense as it is the code used for beacons and such. Otherwise they
> >>> >> > might
> >>> >> > get
> >>> >> > alarms on 404s increasing.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
> oke...@wikimedia.org>
> >>> >> > wrote:
> >>> >> >
> >>> >> > Not really; the new pageviews definition wouldn't include those
> >>> >> > files
> >>> >> > anyway. It seems silly, thought, be deliberately generating a
> large
> >>> >> > amount of automated noise and client requests for this :/.
> >>> >> >
> >>> >> >
> >>> >> > On 4 February 2015 at 15:00, Gergo Tisza <gti...@wikimedia.org>
> >>> >> > wrote:
> >>> >> >> Hi all,
> >>> >> >>
> >>> >> >> Erik Zachte is working on file view stats and is looking for a
> way
> >>> >> >> to
> >>> >> >> track
> >>> >> >> Media Viewer image views (for which there is no 1:1 relation
> >>> >> >> between
> >>> >> >> server
> >>> >> >> hits and actual image views); after some back and forth in
> >>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the
> following
> >>> >> >> hack:
> >>> >> >>
> >>> >> >> whenever the javascript code in MediaViewer determines that an
> >>> >> >> image
> >>> >> >> view
> >>> >> >> happened (e.g. an image has been displayed for a certain amount
> of
> >>> >> >> time),
> >>> >> >> it
> >>> >> >> makes a request to a certain fake image, say
> >>> >> >>
> >>> >> >>
> >>> >> >>
> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
> >>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be
> >>> >> >> easily
> >>> >> >> filtered from the varnish request logs and added to the normal
> >>> >> >> requests.
> >>> >> >> We
> >>> >> >> would add a rule to Vagrant to make sure it does not try to look
> up
> >>> >> >> such
> >>> >> >> requests in Swift but returns a 404 immediately.
> >>> >> >>
> >>> >> >> This would be a temporary workaround until there is a proper way
> to
> >>> >> >> log
> >>> >> >> virtual image views, such as EventLogging with a non-SQL backend.
> >>> >> >>
> >>> >> >> Do you see any fundamental problem with this?
> >>> >> >>
> >>> >> >
> >>> >> >> _______________________________________________
> >>> >> >> Analytics mailing list
> >>> >> >> Analytics@lists.wikimedia.org
> >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>> >> >>
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Oliver Keyes
> >>> >> > Research Analyst
> >>> >> > Wikimedia Foundation
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Analytics mailing list
> >>> >> > Analytics@lists.wikimedia.org
> >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Analytics mailing list
> >>> >> > Analytics@lists.wikimedia.org
> >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Oliver Keyes
> >>> >> Research Analyst
> >>> >> Wikimedia Foundation
> >>> >>
> >>> >> _______________________________________________
> >>> >> Analytics mailing list
> >>> >> Analytics@lists.wikimedia.org
> >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>> >
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Analytics mailing list
> >>> > Analytics@lists.wikimedia.org
> >>> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Oliver Keyes
> >>> Research Analyst
> >>> Wikimedia Foundation
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to