Assuming this was public, I could use this data on seldom edited Wikis to
find out which editors likely have old browser/OS versions with
vulnerabilities that I could attack[1].  This would be easier and easier
the more dimensions you add to the data.

<re-reads>

OK.  The anonymization strategy for dropping records that represent < 50
distinct editors seems to address this concern.   50 edits is a lot.  So
this data wouldn't be too terribly useful for under-active wikis.  Then
again, if you just want to a sense for what the dominant browser/OS pairs
are, then they will likely represent > 50 unique editors on most projects.

1. Props to Matt Flaschen and Dan Andreescu for helping me work through the
implications of that one.

On Tue, Mar 3, 2015 at 9:59 PM, Oliver Keyes <oke...@wikimedia.org> wrote:

> Yeah, makes sense.
>
> On 3 March 2015 at 20:38, Nuria Ruiz <nu...@wikimedia.org> wrote:
> >>Agreed. Do we have a way of syncing files to Labs yet?
> > No need to sync if file is available in an endpoint like
> > htpp://some-data-here
> >
> > On Tue, Mar 3, 2015 at 4:50 PM, Oliver Keyes <oke...@wikimedia.org>
> wrote:
> >>
> >> On 3 March 2015 at 19:35, Nuria Ruiz <nu...@wikimedia.org> wrote:
> >> >>Erik has asked me to write an exploratory app for user-agent data. The
> >> >>idea is to enable Product Managers and engineers to easily explore
> >> >>what users use so they know what to support. I've thrown up an example
> >> >>screenshot at http://ironholds.org/agents_example_screen.png
> >> >
> >> > I cannot speak as to the interest of community about this data but for
> >> > developers and PM we should make sure we have a solid way to update
> any
> >> > data
> >> > we put up. User Agent data is outdated as soon as a new version of
> >> > android
> >> > or iOs is released, a new popular phone comes along or a new
> autoupdate
> >> > for
> >> > popular browsers. Not only that, if we make changes to, say, redirect
> >> > all
> >> > iPad users to the desktop site we want to asses effect of those
> changes
> >> > as
> >> > soon as possible. A monthly update will be a must. Also distinguishing
> >> > between browser percentages on desktop site versus mobile site versus
> >> > apps
> >> > is a must for this data to be real useful for PMs and developers
> >> > (specially
> >> > for bug triage).
> >> >
> >>
> >> Yes! However, I am addressing a specific ad-hoc request. If there is a
> >> need for this (I agree there is) I hope Toby and Kevin can eke out the
> >> time on the Analytics Engineering schedule to work on it; y'all are a
> >> lot better at infrastructure work than me :).
> >>
> >> >
> >> > We have couple backlog items to make monthly reports on this regard. A
> >> > UI on
> >> > top of them will be superb.
> >> >
> >>
> >> Agreed. Do we have a way of syncing files to Labs yet? That's the
> >> biggest blocker. The UI doesn't care what the file contains as long as
> >> it's a TSV with a header row - I've deliberately built it so that
> >> things like the download links are dynamic and can change.
> >>
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Mar 3, 2015 at 1:05 PM, Oliver Keyes <oke...@wikimedia.org>
> >> > wrote:
> >> >>
> >> >> Hey all,
> >> >>
> >> >> (Sending this to the public list because it's more transparent and
> I'd
> >> >> like people who think this data is useful to be able to shout out)
> >> >>
> >> >> Erik has asked me to write an exploratory app for user-agent data.
> The
> >> >> idea is to enable Product Managers and engineers to easily explore
> >> >> what users use so they know what to support. I've thrown up an
> example
> >> >> screenshot at http://ironholds.org/agents_example_screen.png  (I'd
> >> >> host it on Commons, inb4Dario, but I'm not sure the copyright status
> >> >> of the UI)
> >> >>
> >> >> One side-effect of this is that we end up with files of common user
> >> >> agents, split between {readers,editors} and {mobile, desktop}, parsed
> >> >> and unparsed. I'd like to release these files. The reuse potential is
> >> >> twofold; researchers and engineers can use the parsed files to see
> >> >> what browser penetration looks like globally and what browsers should
> >> >> be supported at a top-10, and software engineers can use the unparsed
> >> >> files to improve detection rates.
> >> >>
> >> >> The privacy implications /should/ be minimal, because of how this
> data
> >> >> is gathered. The editor data is gathered from the checkuser table,
> >> >> globally, and automatically excludes any user agent used by fewer
> than
> >> >> 50 distinct usernames. The reader data is gathered from a month of
> >> >> 1:1000 sampled log files, and excludes any agent responsible for
> fewer
> >> >> than 500 pageviews in a 24 hour period (except, sampled. So,
> >> >> practically speaking, that's 500,000 pageviews)
> >> >>
> >> >> What do people think about making this a data release? Would people
> >> >> get value from the data, as well as the tool?
> >> >>
> >> >> --
> >> >> Oliver Keyes
> >> >> Research Analyst
> >> >> Wikimedia Foundation
> >> >>
> >> >> _______________________________________________
> >> >> Analytics mailing list
> >> >> Analytics@lists.wikimedia.org
> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > Analytics@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to