Hey all,

(Sending this to the public list because it's more transparent and I'd
like people who think this data is useful to be able to shout out)

Erik has asked me to write an exploratory app for user-agent data. The
idea is to enable Product Managers and engineers to easily explore
what users use so they know what to support. I've thrown up an example
screenshot at http://ironholds.org/agents_example_screen.png  (I'd
host it on Commons, inb4Dario, but I'm not sure the copyright status
of the UI)

One side-effect of this is that we end up with files of common user
agents, split between {readers,editors} and {mobile, desktop}, parsed
and unparsed. I'd like to release these files. The reuse potential is
twofold; researchers and engineers can use the parsed files to see
what browser penetration looks like globally and what browsers should
be supported at a top-10, and software engineers can use the unparsed
files to improve detection rates.

The privacy implications /should/ be minimal, because of how this data
is gathered. The editor data is gathered from the checkuser table,
globally, and automatically excludes any user agent used by fewer than
50 distinct usernames. The reader data is gathered from a month of
1:1000 sampled log files, and excludes any agent responsible for fewer
than 500 pageviews in a 24 hour period (except, sampled. So,
practically speaking, that's 500,000 pageviews)

What do people think about making this a data release? Would people
get value from the data, as well as the tool?

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to