Nuria & Erik: you're totally right, I keep forgetting this problem is more
complicated than I think.

So we should figure out how this statsv magic thing works and see if we can
use it here.

On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz <nu...@wikimedia.org> wrote:

> >[Oliver] My point was more that we should try to avoid traffic-generating
> >[Oliver] requests that exist solely as a hack for analytics purposes;
> >[Dan] Is this a potential solution to Oliver's concern:
>
> I disagree we should be concern about "beacons" to identify preloads, just
> like beacons exist for ads or stats using one to identify preloads doesn't
> seem far fetched (certainly I have used similar code before and it did its
> job).
>
> Note that EL works in a similar fashion requesting a "fake" image to
> varnish to which we answer with a 204. It is very similar and the reason
> why we have such a code is that we do not have a specific endpoint or
> domain where requests of this type could go. Everything requested by our
> users and ourselves ends up in varnish pretty much.
>
>
>
>
>
>
>
>
>
> On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <dandree...@wikimedia.org>
> wrote:
>
>> Is this a potential solution to Oliver's concern:
>>
>> For "real" image views, add an X-Analytics header value of
>> "real-view=true" to the request itself?
>>
>> If that's not feasible, we should look into using statsv for this (not
>> sure how that works) or having this be a different kafka topic and not
>> consumed into HDFS.
>>
>> On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <tneg...@wikimedia.org>
>> wrote:
>>
>>> I created a card -- modify as desired:
>>>
>>> https://trello.com/c/HMgVD4mz
>>>
>>> -Toby
>>>
>>> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tneg...@wikimedia.org>
>>> wrote:
>>>
>>>> It turns out that the media viewer (on desktop; don't know about
>>>> mobile) does a lot of caching so just because an image is loaded from
>>>> swift, it doesn't mean it is viewed. We'd like to provide more accurate
>>>> stats to the GLAM folks, so yes, I think this needs to be added eventually.
>>>> Let's leave it out of scope for now.
>>>>
>>>> -Toby
>>>>
>>>> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <oke...@wikimedia.org>
>>>> wrote:
>>>>
>>>>> We want to include these files in the pageview definition? :/.
>>>>>
>>>>> My point was more that we should try to avoid traffic-generating
>>>>> requests that exist solely as a hack for analytics purposes; it's
>>>>> artificial work for both users and us. If this is the only way of
>>>>> doing things that's totally fine.
>>>>>
>>>>> On 5 February 2015 at 11:38, Toby Negrin <tneg...@wikimedia.org>
>>>>> wrote:
>>>>> > Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop
>>>>> based
>>>>> > solution would be basically doing the same thing as you propose.
>>>>> >
>>>>> > Can you please run it past ops (especially the 404 v 204) part?
>>>>> >
>>>>> > Oliver -- the issue is that we'd like to figure out a way to provide
>>>>> > accurate views of the media files; because of client side caching,
>>>>> we can't
>>>>> > use the current requests. But your point is a good one -- we'll need
>>>>> to add
>>>>> > this to the PV definition.
>>>>> >
>>>>> > -Toby
>>>>> >
>>>>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <oke...@wikimedia.org>
>>>>> wrote:
>>>>> >>
>>>>> >> A nice theory, but if they appear in the webrequest table
>>>>> (presumably
>>>>> >> they would, and we're not creating an entirely new set of varnishes
>>>>> >> for the transmission of dummy images?) they have to be factored in.
>>>>> >> Again, however, the new definition automatically filters them by
>>>>> >> checking the webrequest source and MIME type, so this is not a
>>>>> >> problem, as I originally stated.
>>>>> >>
>>>>> >> On 5 February 2015 at 08:10, Erik Zachte <ezac...@wikimedia.org>
>>>>> wrote:
>>>>> >> > Oliver, this is not about pageviews, but about media file views.
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > These will be collected and dumped separately, as per
>>>>> >> >
>>>>> >> >
>>>>> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
>>>>> >> > .
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > Erik
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > From: analytics-boun...@lists.wikimedia.org
>>>>> >> > [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of
>>>>> Nuria Ruiz
>>>>> >> > Sent: Wednesday, February 04, 2015 22:28
>>>>> >> > To: A mailing list for the Analytics Team at WMF and everybody
>>>>> who has
>>>>> >> > an
>>>>> >> > interest in Wikipedia and analytics.
>>>>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer
>>>>> views
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >>We would add a rule to Vagrant to make sure it does not try to
>>>>> look up
>>>>> >> >> such
>>>>> >> >> requests in Swift but returns a 404 immediately.
>>>>> >> >
>>>>> >> > I bet ops would like it a lot better if this is a 204 and it kind
>>>>> of
>>>>> >> > makes
>>>>> >> > sense as it is the code used for beacons and such. Otherwise they
>>>>> might
>>>>> >> > get
>>>>> >> > alarms on 404s increasing.
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <
>>>>> oke...@wikimedia.org>
>>>>> >> > wrote:
>>>>> >> >
>>>>> >> > Not really; the new pageviews definition wouldn't include those
>>>>> files
>>>>> >> > anyway. It seems silly, thought, be deliberately generating a
>>>>> large
>>>>> >> > amount of automated noise and client requests for this :/.
>>>>> >> >
>>>>> >> >
>>>>> >> > On 4 February 2015 at 15:00, Gergo Tisza <gti...@wikimedia.org>
>>>>> wrote:
>>>>> >> >> Hi all,
>>>>> >> >>
>>>>> >> >> Erik Zachte is working on file view stats and is looking for a
>>>>> way to
>>>>> >> >> track
>>>>> >> >> Media Viewer image views (for which there is no 1:1 relation
>>>>> between
>>>>> >> >> server
>>>>> >> >> hits and actual image views); after some back and forth in
>>>>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the
>>>>> following hack:
>>>>> >> >>
>>>>> >> >> whenever the javascript code in MediaViewer determines that an
>>>>> image
>>>>> >> >> view
>>>>> >> >> happened (e.g. an image has been displayed for a certain amount
>>>>> of
>>>>> >> >> time),
>>>>> >> >> it
>>>>> >> >> makes a request to a certain fake image, say
>>>>> >> >>
>>>>> >> >>
>>>>> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-
>>>>> <real
>>>>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be
>>>>> easily
>>>>> >> >> filtered from the varnish request logs and added to the normal
>>>>> >> >> requests.
>>>>> >> >> We
>>>>> >> >> would add a rule to Vagrant to make sure it does not try to look
>>>>> up
>>>>> >> >> such
>>>>> >> >> requests in Swift but returns a 404 immediately.
>>>>> >> >>
>>>>> >> >> This would be a temporary workaround until there is a proper way
>>>>> to log
>>>>> >> >> virtual image views, such as EventLogging with a non-SQL backend.
>>>>> >> >>
>>>>> >> >> Do you see any fundamental problem with this?
>>>>> >> >>
>>>>> >> >
>>>>> >> >> _______________________________________________
>>>>> >> >> Analytics mailing list
>>>>> >> >> Analytics@lists.wikimedia.org
>>>>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > Oliver Keyes
>>>>> >> > Research Analyst
>>>>> >> > Wikimedia Foundation
>>>>> >> >
>>>>> >> > _______________________________________________
>>>>> >> > Analytics mailing list
>>>>> >> > Analytics@lists.wikimedia.org
>>>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > _______________________________________________
>>>>> >> > Analytics mailing list
>>>>> >> > Analytics@lists.wikimedia.org
>>>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Oliver Keyes
>>>>> >> Research Analyst
>>>>> >> Wikimedia Foundation
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Analytics mailing list
>>>>> >> Analytics@lists.wikimedia.org
>>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Analytics mailing list
>>>>> > Analytics@lists.wikimedia.org
>>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Oliver Keyes
>>>>> Research Analyst
>>>>> Wikimedia Foundation
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to