Erik,

Again: I am not, and at no point in this conversation have been,
concerned about the pageview definition.

(Repeat no. 5)

On 5 February 2015 at 17:28, Erik Zachte <ezac...@wikimedia.org> wrote:
> I'm not sure why a beacon would have to be a dummy html file, thus confusing
> PV stats.
>
> Could it not be a dummy image request, more in line with the one pixel
> images that are often used.
>
> This way Oliver can relax, go on vacation for real, without keeping a close
> watch over PV definitions.
>
>
>
> From: analytics-boun...@lists.wikimedia.org
> [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Dan Andreescu
> Sent: Thursday, February 05, 2015 22:43
>
>
> To: A mailing list for the Analytics Team at WMF and everybody who has an
> interest in Wikipedia and analytics.
> Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
>
>
>
> Nuria & Erik: you're totally right, I keep forgetting this problem is more
> complicated than I think.
>
>
>
> So we should figure out how this statsv magic thing works and see if we can
> use it here.
>
>
>
> On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz <nu...@wikimedia.org> wrote:
>
>>[Oliver] My point was more that we should try to avoid traffic-generating
>
>>[Oliver] requests that exist solely as a hack for analytics purposes;
>
>>[Dan] Is this a potential solution to Oliver's concern:
>
>
>
> I disagree we should be concern about "beacons" to identify preloads, just
> like beacons exist for ads or stats using one to identify preloads doesn't
> seem far fetched (certainly I have used similar code before and it did its
> job).
>
>
>
> Note that EL works in a similar fashion requesting a "fake" image to varnish
> to which we answer with a 204. It is very similar and the reason why we have
> such a code is that we do not have a specific endpoint or domain where
> requests of this type could go. Everything requested by our users and
> ourselves ends up in varnish pretty much.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <dandree...@wikimedia.org>
> wrote:
>
> Is this a potential solution to Oliver's concern:
>
>
>
> For "real" image views, add an X-Analytics header value of "real-view=true"
> to the request itself?
>
>
>
> If that's not feasible, we should look into using statsv for this (not sure
> how that works) or having this be a different kafka topic and not consumed
> into HDFS.
>
>
>
> On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <tneg...@wikimedia.org> wrote:
>
> I created a card -- modify as desired:
>
>
>
> https://trello.com/c/HMgVD4mz
>
>
>
> -Toby
>
>
>
> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <tneg...@wikimedia.org> wrote:
>
> It turns out that the media viewer (on desktop; don't know about mobile)
> does a lot of caching so just because an image is loaded from swift, it
> doesn't mean it is viewed. We'd like to provide more accurate stats to the
> GLAM folks, so yes, I think this needs to be added eventually. Let's leave
> it out of scope for now.
>
>
>
> -Toby
>
>
>
> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <oke...@wikimedia.org> wrote:
>
> We want to include these files in the pageview definition? :/.
>
> My point was more that we should try to avoid traffic-generating
> requests that exist solely as a hack for analytics purposes; it's
> artificial work for both users and us. If this is the only way of
> doing things that's totally fine.
>
>
> On 5 February 2015 at 11:38, Toby Negrin <tneg...@wikimedia.org> wrote:
>> Hi Gergo -- I like this idea.  As far as capacity, any EL-Hadoop based
>> solution would be basically doing the same thing as you propose.
>>
>> Can you please run it past ops (especially the 404 v 204) part?
>>
>> Oliver -- the issue is that we'd like to figure out a way to provide
>> accurate views of the media files; because of client side caching, we
>> can't
>> use the current requests. But your point is a good one -- we'll need to
>> add
>> this to the PV definition.
>>
>> -Toby
>>
>> On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <oke...@wikimedia.org> wrote:
>>>
>>> A nice theory, but if they appear in the webrequest table (presumably
>>> they would, and we're not creating an entirely new set of varnishes
>>> for the transmission of dummy images?) they have to be factored in.
>>> Again, however, the new definition automatically filters them by
>>> checking the webrequest source and MIME type, so this is not a
>>> problem, as I originally stated.
>>>
>>> On 5 February 2015 at 08:10, Erik Zachte <ezac...@wikimedia.org> wrote:
>>> > Oliver, this is not about pageviews, but about media file views.
>>> >
>>> >
>>> >
>>> > These will be collected and dumped separately, as per
>>> >
>>> >
>>> > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts
>>> > .
>>> >
>>> >
>>> >
>>> > Erik
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > From: analytics-boun...@lists.wikimedia.org
>>> > [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Nuria Ruiz
>>> > Sent: Wednesday, February 04, 2015 22:28
>>> > To: A mailing list for the Analytics Team at WMF and everybody who has
>>> > an
>>> > interest in Wikipedia and analytics.
>>> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer views
>>> >
>>> >
>>> >
>>> >>We would add a rule to Vagrant to make sure it does not try to look up
>>> >> such
>>> >> requests in Swift but returns a 404 immediately.
>>> >
>>> > I bet ops would like it a lot better if this is a 204 and it kind of
>>> > makes
>>> > sense as it is the code used for beacons and such. Otherwise they might
>>> > get
>>> > alarms on 404s increasing.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <oke...@wikimedia.org>
>>> > wrote:
>>> >
>>> > Not really; the new pageviews definition wouldn't include those files
>>> > anyway. It seems silly, thought, be deliberately generating a large
>>> > amount of automated noise and client requests for this :/.
>>> >
>>> >
>>> > On 4 February 2015 at 15:00, Gergo Tisza <gti...@wikimedia.org> wrote:
>>> >> Hi all,
>>> >>
>>> >> Erik Zachte is working on file view stats and is looking for a way to
>>> >> track
>>> >> Media Viewer image views (for which there is no 1:1 relation between
>>> >> server
>>> >> hits and actual image views); after some back and forth in
>>> >> https://phabricator.wikimedia.org/T86914 I proposed the following
>>> >> hack:
>>> >>
>>> >> whenever the javascript code in MediaViewer determines that an image
>>> >> view
>>> >> happened (e.g. an image has been displayed for a certain amount of
>>> >> time),
>>> >> it
>>> >> makes a request to a certain fake image, say
>>> >>
>>> >>
>>> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real
>>> >> image name>/<size>px-thumbnail.<ext> . These hits can than be easily
>>> >> filtered from the varnish request logs and added to the normal
>>> >> requests.
>>> >> We
>>> >> would add a rule to Vagrant to make sure it does not try to look up
>>> >> such
>>> >> requests in Swift but returns a 404 immediately.
>>> >>
>>> >> This would be a temporary workaround until there is a proper way to
>>> >> log
>>> >> virtual image views, such as EventLogging with a non-SQL backend.
>>> >>
>>> >> Do you see any fundamental problem with this?
>>> >>
>>> >
>>> >> _______________________________________________
>>> >> Analytics mailing list
>>> >> Analytics@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Oliver Keyes
>>> > Research Analyst
>>> > Wikimedia Foundation
>>> >
>>> > _______________________________________________
>>> > Analytics mailing list
>>> > Analytics@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Analytics mailing list
>>> > Analytics@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to