Just an FYI that Legal have approved this release under the
anonymisation procedures we've set out (thanks Michelle!) on the
condition that Dario, too, is comfortable with them. Dario?

On 4 March 2015 at 17:16, Oliver Keyes <oke...@wikimedia.org> wrote:
> So it's distinct people, globally - and I deliberately made it wooly
> it by operating over username, which means the threshold is fuzzy
> (i.e., at a minimum it's 50. At a maximum it's 50x[number of wikis]).
>
> It's very deliberately dimension-free: user_agent,
> edit_count_in_non_specified_90_day_period, and that's it.
>
> On 4 March 2015 at 17:12, Aaron Halfaker <ahalfa...@wikimedia.org> wrote:
>> Assuming this was public, I could use this data on seldom edited Wikis to
>> find out which editors likely have old browser/OS versions with
>> vulnerabilities that I could attack[1].  This would be easier and easier the
>> more dimensions you add to the data.
>>
>> <re-reads>
>>
>> OK.  The anonymization strategy for dropping records that represent < 50
>> distinct editors seems to address this concern.   50 edits is a lot.  So
>> this data wouldn't be too terribly useful for under-active wikis.  Then
>> again, if you just want to a sense for what the dominant browser/OS pairs
>> are, then they will likely represent > 50 unique editors on most projects.
>>
>> 1. Props to Matt Flaschen and Dan Andreescu for helping me work through the
>> implications of that one.
>>
>> On Tue, Mar 3, 2015 at 9:59 PM, Oliver Keyes <oke...@wikimedia.org> wrote:
>>>
>>> Yeah, makes sense.
>>>
>>> On 3 March 2015 at 20:38, Nuria Ruiz <nu...@wikimedia.org> wrote:
>>> >>Agreed. Do we have a way of syncing files to Labs yet?
>>> > No need to sync if file is available in an endpoint like
>>> > htpp://some-data-here
>>> >
>>> > On Tue, Mar 3, 2015 at 4:50 PM, Oliver Keyes <oke...@wikimedia.org>
>>> > wrote:
>>> >>
>>> >> On 3 March 2015 at 19:35, Nuria Ruiz <nu...@wikimedia.org> wrote:
>>> >> >>Erik has asked me to write an exploratory app for user-agent data.
>>> >> >> The
>>> >> >>idea is to enable Product Managers and engineers to easily explore
>>> >> >>what users use so they know what to support. I've thrown up an
>>> >> >> example
>>> >> >>screenshot at http://ironholds.org/agents_example_screen.png
>>> >> >
>>> >> > I cannot speak as to the interest of community about this data but
>>> >> > for
>>> >> > developers and PM we should make sure we have a solid way to update
>>> >> > any
>>> >> > data
>>> >> > we put up. User Agent data is outdated as soon as a new version of
>>> >> > android
>>> >> > or iOs is released, a new popular phone comes along or a new
>>> >> > autoupdate
>>> >> > for
>>> >> > popular browsers. Not only that, if we make changes to, say, redirect
>>> >> > all
>>> >> > iPad users to the desktop site we want to asses effect of those
>>> >> > changes
>>> >> > as
>>> >> > soon as possible. A monthly update will be a must. Also
>>> >> > distinguishing
>>> >> > between browser percentages on desktop site versus mobile site versus
>>> >> > apps
>>> >> > is a must for this data to be real useful for PMs and developers
>>> >> > (specially
>>> >> > for bug triage).
>>> >> >
>>> >>
>>> >> Yes! However, I am addressing a specific ad-hoc request. If there is a
>>> >> need for this (I agree there is) I hope Toby and Kevin can eke out the
>>> >> time on the Analytics Engineering schedule to work on it; y'all are a
>>> >> lot better at infrastructure work than me :).
>>> >>
>>> >> >
>>> >> > We have couple backlog items to make monthly reports on this regard.
>>> >> > A
>>> >> > UI on
>>> >> > top of them will be superb.
>>> >> >
>>> >>
>>> >> Agreed. Do we have a way of syncing files to Labs yet? That's the
>>> >> biggest blocker. The UI doesn't care what the file contains as long as
>>> >> it's a TSV with a header row - I've deliberately built it so that
>>> >> things like the download links are dynamic and can change.
>>> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 3, 2015 at 1:05 PM, Oliver Keyes <oke...@wikimedia.org>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hey all,
>>> >> >>
>>> >> >> (Sending this to the public list because it's more transparent and
>>> >> >> I'd
>>> >> >> like people who think this data is useful to be able to shout out)
>>> >> >>
>>> >> >> Erik has asked me to write an exploratory app for user-agent data.
>>> >> >> The
>>> >> >> idea is to enable Product Managers and engineers to easily explore
>>> >> >> what users use so they know what to support. I've thrown up an
>>> >> >> example
>>> >> >> screenshot at http://ironholds.org/agents_example_screen.png  (I'd
>>> >> >> host it on Commons, inb4Dario, but I'm not sure the copyright status
>>> >> >> of the UI)
>>> >> >>
>>> >> >> One side-effect of this is that we end up with files of common user
>>> >> >> agents, split between {readers,editors} and {mobile, desktop},
>>> >> >> parsed
>>> >> >> and unparsed. I'd like to release these files. The reuse potential
>>> >> >> is
>>> >> >> twofold; researchers and engineers can use the parsed files to see
>>> >> >> what browser penetration looks like globally and what browsers
>>> >> >> should
>>> >> >> be supported at a top-10, and software engineers can use the
>>> >> >> unparsed
>>> >> >> files to improve detection rates.
>>> >> >>
>>> >> >> The privacy implications /should/ be minimal, because of how this
>>> >> >> data
>>> >> >> is gathered. The editor data is gathered from the checkuser table,
>>> >> >> globally, and automatically excludes any user agent used by fewer
>>> >> >> than
>>> >> >> 50 distinct usernames. The reader data is gathered from a month of
>>> >> >> 1:1000 sampled log files, and excludes any agent responsible for
>>> >> >> fewer
>>> >> >> than 500 pageviews in a 24 hour period (except, sampled. So,
>>> >> >> practically speaking, that's 500,000 pageviews)
>>> >> >>
>>> >> >> What do people think about making this a data release? Would people
>>> >> >> get value from the data, as well as the tool?
>>> >> >>
>>> >> >> --
>>> >> >> Oliver Keyes
>>> >> >> Research Analyst
>>> >> >> Wikimedia Foundation
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Analytics mailing list
>>> >> >> Analytics@lists.wikimedia.org
>>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >> >
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > Analytics mailing list
>>> >> > Analytics@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Oliver Keyes
>>> >> Research Analyst
>>> >> Wikimedia Foundation
>>> >>
>>> >> _______________________________________________
>>> >> Analytics mailing list
>>> >> Analytics@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Analytics mailing list
>>> > Analytics@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to