Mimi Yin wrote:
Both you and Katie have asked me to think more about what kind of
metrics PPD would like to see. Here's a first pass at fulfilling
that request. The list below is very much a *wish list*.
What *can* we capture? What needs to be more clearly defined? Any
other ideas?
Thanks, Mimi for this work.
About the list you provide, it's a reasonable set of questions to
ask. It's a mix of stuff we have, easy stuff, hard stuff, and some
huh? stuff.
One next step I'll take is to cut-and-paste your list as-is into a new
PPD dashboard page. There, as a prototype, I'll go ahead and point to
the metric streams we do have, and prototype the ones that are missing
but feasible. When it's a little more fleshed out, we can refactor
the various dashboards and put the logical stuff together.
For the overall issue of metrics, there are still some looming issues
such as "what counts as a person" topic in my last email, some missing
bits until we add some yet-to-be-specified logging features, and
developing a shared sense of the very short list of metrics we should
focus on most.
Next, going point-by-point on your list:
Better visibility into how Desktop users are using the service.
Cumulatively, over time...
# of unique user accounts
# of collections
# of items
# of unique publishers
# of unique subscribers
# of subscriptions
Many of these are available, in aggregate by counting the number of
rows in various database tables. Items, events, subscriptions, and
tickets are here:
http://dashboard.osafoundation.org/dashboard/hub
Let's spec out "unique publishers" and "unique subscribers" more; is
that "total number of people who have published a collection"? Mildly
tricky given the autocreation of the Hub OOTB. Maybe we want "total
number of accounts which have more than 1 collection"? Dunno quite
how to implement right now. "Unique subscribers" is "number of
accounts which have any subscriptions created"?
I don't think "unique" adds anything to the "# of user accounts",
right? I'll add one somewhere, though this is a bit outside the
"Desktop visibility" sphere. (I suppose most of the above are too).
# of syncs per day per user
Most "per X per Y" requests are going to be either tricky or in need
of some simplification. You don't really want, I assume, a list of
2000+ rows, with the number of syncs each user has done, produced each
day. Plus there will be a privacy issue/discussion each time we say
"per user". So what sort of simplification is helpful. I'm not sure
what you'd do with an average syncs per day, anyway; is your thought
to get the classic insight of about how long users leave Desktop
running in a day?
Spread of # of collections per user account
Spread of # of items per collection
Histograms? This will be good info, though I'll need to first write
some additional framework stuff to collect and present (graph)
histogram-style reports first.
# of collections per item
Hmmm, a nice inverse. Ok. Seems like maybe a low-hanging fruit if I
do a straight average of all items (not broken down by item).
For both: per collection and per user...
# of new items per day
# of items edited per day
# of edits per item per day
# of editors per item per day
# of editors per collection per day
All these will require additional server infrastructure and logging,
I'm pretty sure, to be able to start a meaningful analysis. Also, the
standard "per X" questions apply; you don't really want a table with
every collection and another table with every user, right? The "# of
editors" questions could use some additional dev exploration; offhand,
I'd have to think about what data we'd need to capture to produce that.
# of read/write versus read-only subscriptions
Hurm, ok. That sounds like a chunk of scripting to look up each kind,
given just the raw ticket string in the logs.
# of subscribers per collection
# collections per user
It seems like lots of the above is useful info for Hub users too. Do
you mean to collect the above specifically for Desktop users, split
off? I'm guessing it might wind up being difficult to split the two
categories even; for something like # of editors, we might wind up
with an aggregate of both Desktop and Hub users making edits on an item.
Better visibility into the users who are accessing the Hub UI directly
# of times per week Hub UI users visit Hub UI
Average session time per visit
# and %age of 1-time Hub UI users
These hinge on the complicated "what is a person" issues raised in my
recent email. It's currently tricky for me to separate tickets from
accounts for Hub use, although the aggregates are reasonably accurate
I think. Offhand, these are awesome questions, but they are daunting
to approach from an implementation perspective. Maybe I can find a
reasonable place to start with an implementation if I think about it
for a while.
Session time is just the utter bane of log analysis people since the
beginning of web servers. It's inherently a painful guess with some
tradeoffs. Some additional backend work will be needed to put a
session id where needed or to design a heuristic for creating one.
I'll see what happens if I use a regular stats package. Maybe it can
say something useful about average session time even if it does weird
things with the URL analysis itself.
# of Hub UI users that have subscriptions to Desktop collections
- How many of these have accounts? don't have accounts.
My first reaction is "a collection is a collection", that there's no
difference between a Desktop collection and a Hub collection. But I
guess you're thinking "a collection which is published into an account
used by a user that is primarily a Desktop user". Which makes sense
as a product question (acknowledging overlap between "Desktop users"
and "Hub users"). But technically, it doesn't seem quite feasible to
do this separation properly. To the second bit, I think all users
with subscriptions have an account; what did you mean by the second
line exactly?
# of Hub UI users that have subscriptions to collections published
from other clients? Which clients?
Another very hard one, it seems. It would be interesting to keep
historical information about the first client which created each
collection, but I think it's a backburner request, as it'd take a fair
bit of infrastructure for probably not a lot of payoff other than a
prerequisite for this question.
# of Hub UI users without subscriptions
Can you wrap up with a little more detail about this "Desktop user" vs
"Hub UI user". Is the thought that each account should be split into
one of these buckets depending on past behavior? And which they use
"more"? (Recognizing that we'll never identify anonymous users
accessing via a ticket.)
Thanks for the list, Mimi. It's always helpful. A number of them
just scare me and I wonder how I'll ever make headway on them, but
it's certainly useful to have the list. I may not have answers right
now, but maybe something clever will come up like the "IP+browser pair
== 1 user" trick outlined in that previous email.
-- Jared
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "General" mailing list
http://lists.osafoundation.org/mailman/listinfo/general