IIRC, it's actually desirable to have these PVs in hadoop so we can run the
queries in concert with mobile page views.

Erik Z -- thoughts?

-Toby

On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <dandree...@wikimedia.org>
wrote:

> Is this a potential solution to Oliver's concern:
>
> For "real" image views, add an X-Analytics header value of
> "real-view=true" to the request itself?
>
> If that's not feasible, we should look into using statsv for this (not
> sure how that works) or having this be a different kafka topic and not
> consumed into HDFS.
>
> On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <tneg...@wikimedia.org>
> wrote:
>
>> I created a card -- modify as desired:
>>
>> https://trello.com/c/HMgVD4mz
>>
>> -Toby
>>
>> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tneg...@wikimedia.org>
>> wrote:
>>
>>> It turns out that the media viewer (on desktop; don't know about mobile)
>>> does a lot of caching so just because an image is loaded from swift, it
>>> doesn't mean it is viewed. We'd like to provide more accurate stats to the
>>> GLAM folks, so yes, I think this needs to be added eventually. Let's leave
>>> it out of scope for now.
>>>
>>> -Toby
>>>
>>> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <oke...@wikimedia.org>
>>> wrote:
>>>
>>>> We want to include these files in the pageview definition? :/.
>>>>
>>>> My point was more that we should try to avoid traffic-generating
>>>> requests that exist solely as a hack for analytics purposes; it's
>>>> artificial work for both users and us. If this is the only way of
>>>> doing things that's totally fine.
>>>>
>>>> On 5 February 2015 at 11:38, Toby Negrin <tneg...@wikimedia.org> wrote:
>>>> > Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop based
>>>> > solution would be basically doing the same thing as you propose.
>>>> >
>>>> > Can you please run it past ops (especially the 404 v 204) part?
>>>> >
>>>> > Oliver -- the issue is that we'd like to figure out a way to provide
>>>> > accurate views of the media files; because of client side caching, we
>>>> can't
>>>> > use the current requests. But your point is a good one -- we'll need
>>>> to add
>>>> > this to the PV definition.
>>>> >
>>>> > -Toby
>>>> >
>>>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <oke...@wikimedia.org>
>>>> wrote:
>>>> >>
>>>> >> A nice theory, but if they appear in the webrequest table (presumably
>>>> >> they would, and we're not creating an entirely new set of varnishes
>>>> >> for the transmission of dummy images?) they have to be factored in.
>>>> >> Again, however, the new definition automatically filters them by
>>>> >> checking the webrequest source and MIME type, so this is not a
>>>> >> problem, as I originally stated.
>>>> >>
>>>> >> On 5 February 2015 at 08:10, Erik Zachte <ezac...@wikimedia.org>
>>>> wrote:
>>>> >> > Oliver, this is not about pageviews, but about media file views.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > These will be collected and dumped separately, as per
>>>> >> >
>>>> >> >
>>>> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
>>>> >> > .
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Erik
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > From: analytics-boun...@lists.wikimedia.org
>>>> >> > [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Nuria
>>>> Ruiz
>>>> >> > Sent: Wednesday, February 04, 2015 22:28
>>>> >> > To: A mailing list for the Analytics Team at WMF and everybody who
>>>> has
>>>> >> > an
>>>> >> > interest in Wikipedia and analytics.
>>>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
>>>> views
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >>We would add a rule to Vagrant to make sure it does not try to
>>>> look up
>>>> >> >> such
>>>> >> >> requests in Swift but returns a 404 immediately.
>>>> >> >
>>>> >> > I bet ops would like it a lot better if this is a 204 and it kind
>>>> of
>>>> >> > makes
>>>> >> > sense as it is the code used for beacons and such. Otherwise they
>>>> might
>>>> >> > get
>>>> >> > alarms on 404s increasing.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
>>>> oke...@wikimedia.org>
>>>> >> > wrote:
>>>> >> >
>>>> >> > Not really; the new pageviews definition wouldn't include those
>>>> files
>>>> >> > anyway. It seems silly, thought, be deliberately generating a large
>>>> >> > amount of automated noise and client requests for this :/.
>>>> >> >
>>>> >> >
>>>> >> > On 4 February 2015 at 15:00, Gergo Tisza <gti...@wikimedia.org>
>>>> wrote:
>>>> >> >> Hi all,
>>>> >> >>
>>>> >> >> Erik Zachte is working on file view stats and is looking for a
>>>> way to
>>>> >> >> track
>>>> >> >> Media Viewer image views (for which there is no 1:1 relation
>>>> between
>>>> >> >> server
>>>> >> >> hits and actual image views); after some back and forth in
>>>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the
>>>> following hack:
>>>> >> >>
>>>> >> >> whenever the javascript code in MediaViewer determines that an
>>>> image
>>>> >> >> view
>>>> >> >> happened (e.g. an image has been displayed for a certain amount of
>>>> >> >> time),
>>>> >> >> it
>>>> >> >> makes a request to a certain fake image, say
>>>> >> >>
>>>> >> >>
>>>> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
>>>> <real
>>>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be
>>>> easily
>>>> >> >> filtered from the varnish request logs and added to the normal
>>>> >> >> requests.
>>>> >> >> We
>>>> >> >> would add a rule to Vagrant to make sure it does not try to look
>>>> up
>>>> >> >> such
>>>> >> >> requests in Swift but returns a 404 immediately.
>>>> >> >>
>>>> >> >> This would be a temporary workaround until there is a proper way
>>>> to log
>>>> >> >> virtual image views, such as EventLogging with a non-SQL backend.
>>>> >> >>
>>>> >> >> Do you see any fundamental problem with this?
>>>> >> >>
>>>> >> >
>>>> >> >> _______________________________________________
>>>> >> >> Analytics mailing list
>>>> >> >> Analytics@lists.wikimedia.org
>>>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Oliver Keyes
>>>> >> > Research Analyst
>>>> >> > Wikimedia Foundation
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Analytics mailing list
>>>> >> > Analytics@lists.wikimedia.org
>>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > Analytics mailing list
>>>> >> > Analytics@lists.wikimedia.org
>>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Oliver Keyes
>>>> >> Research Analyst
>>>> >> Wikimedia Foundation
>>>> >>
>>>> >> _______________________________________________
>>>> >> Analytics mailing list
>>>> >> Analytics@lists.wikimedia.org
>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Analytics mailing list
>>>> > Analytics@lists.wikimedia.org
>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to