Re: [General] Updated Hub usage metric - PPD perspective

Jared Rhine Mon, 26 Nov 2007 15:49:11 -0800

Mimi Yin wrote:

Both you and Katie have asked me to think more about what kind ofmetrics PPD would like to see. Here's a first pass at fulfillingthat request. The list below is very much a *wish list*.
What *can* we capture? What needs to be more clearly defined? Anyother ideas?


Thanks, Mimi for this work.

About the list you provide, it's a reasonable set of questions toask. It's a mix of stuff we have, easy stuff, hard stuff, and somehuh? stuff.

One next step I'll take is to cut-and-paste your list as-is into a newPPD dashboard page. There, as a prototype, I'll go ahead and point tothe metric streams we do have, and prototype the ones that are missingbut feasible. When it's a little more fleshed out, we can refactorthe various dashboards and put the logical stuff together.

For the overall issue of metrics, there are still some looming issuessuch as "what counts as a person" topic in my last email, some missingbits until we add some yet-to-be-specified logging features, anddeveloping a shared sense of the very short list of metrics we shouldfocus on most.


Next, going point-by-point on your list:

Better visibility into how Desktop users are using the service.

Cumulatively, over time...

# of unique user accounts
# of collections
# of items
# of unique publishers
# of unique subscribers
# of subscriptions

Many of these are available, in aggregate by counting the number ofrows in various database tables. Items, events, subscriptions, andtickets are here:


http://dashboard.osafoundation.org/dashboard/hub

Let's spec out "unique publishers" and "unique subscribers" more; isthat "total number of people who have published a collection"? Mildlytricky given the autocreation of the Hub OOTB. Maybe we want "totalnumber of accounts which have more than 1 collection"? Dunno quitehow to implement right now. "Unique subscribers" is "number ofaccounts which have any subscriptions created"?

I don't think "unique" adds anything to the "# of user accounts",right? I'll add one somewhere, though this is a bit outside the"Desktop visibility" sphere. (I suppose most of the above are too).

# of syncs per day per user

Most "per X per Y" requests are going to be either tricky or in needof some simplification. You don't really want, I assume, a list of2000+ rows, with the number of syncs each user has done, produced eachday. Plus there will be a privacy issue/discussion each time we say"per user". So what sort of simplification is helpful. I'm not surewhat you'd do with an average syncs per day, anyway; is your thoughtto get the classic insight of about how long users leave Desktoprunning in a day?

Spread of # of collections per user account
Spread of # of items per collection

Histograms? This will be good info, though I'll need to first writesome additional framework stuff to collect and present (graph)histogram-style reports first.

# of collections per item

Hmmm, a nice inverse. Ok. Seems like maybe a low-hanging fruit if Ido a straight average of all items (not broken down by item).


For both: per collection and per user...
# of new items per day
# of items edited per day
# of edits per item per day
# of editors per item per day
# of editors per collection per day

All these will require additional server infrastructure and logging,I'm pretty sure, to be able to start a meaningful analysis. Also, thestandard "per X" questions apply; you don't really want a table withevery collection and another table with every user, right? The "# ofeditors" questions could use some additional dev exploration; offhand,I'd have to think about what data we'd need to capture to produce that.

# of read/write versus read-only subscriptions

Hurm, ok. That sounds like a chunk of scripting to look up each kind,given just the raw ticket string in the logs.

# of subscribers per collection
# collections per user

It seems like lots of the above is useful info for Hub users too. Doyou mean to collect the above specifically for Desktop users, splitoff? I'm guessing it might wind up being difficult to split the twocategories even; for something like # of editors, we might wind upwith an aggregate of both Desktop and Hub users making edits on an item.

Better visibility into the users who are accessing the Hub UI directly

# of times per week Hub UI users visit Hub UI
Average session time per visit
# and %age of 1-time Hub UI users

These hinge on the complicated "what is a person" issues raised in myrecent email. It's currently tricky for me to separate tickets fromaccounts for Hub use, although the aggregates are reasonably accurateI think. Offhand, these are awesome questions, but they are dauntingto approach from an implementation perspective. Maybe I can find areasonable place to start with an implementation if I think about itfor a while.

Session time is just the utter bane of log analysis people since thebeginning of web servers. It's inherently a painful guess with sometradeoffs. Some additional backend work will be needed to put asession id where needed or to design a heuristic for creating one.I'll see what happens if I use a regular stats package. Maybe it cansay something useful about average session time even if it does weirdthings with the URL analysis itself.

# of Hub UI users that have subscriptions to Desktop collections
- How many of these have accounts? don't have accounts.

My first reaction is "a collection is a collection", that there's nodifference between a Desktop collection and a Hub collection. But Iguess you're thinking "a collection which is published into an accountused by a user that is primarily a Desktop user". Which makes senseas a product question (acknowledging overlap between "Desktop users"and "Hub users"). But technically, it doesn't seem quite feasible todo this separation properly. To the second bit, I think all userswith subscriptions have an account; what did you mean by the secondline exactly?

# of Hub UI users that have subscriptions to collections publishedfrom other clients? Which clients?

Another very hard one, it seems. It would be interesting to keephistorical information about the first client which created eachcollection, but I think it's a backburner request, as it'd take a fairbit of infrastructure for probably not a lot of payoff other than aprerequisite for this question.

# of Hub UI users without subscriptions

Can you wrap up with a little more detail about this "Desktop user" vs"Hub UI user". Is the thought that each account should be split intoone of these buckets depending on past behavior? And which they use"more"? (Recognizing that we'll never identify anonymous usersaccessing via a ticket.)

Thanks for the list, Mimi. It's always helpful. A number of themjust scare me and I wonder how I'll ever make headway on them, butit's certainly useful to have the list. I may not have answers rightnow, but maybe something clever will come up like the "IP+browser pair== 1 user" trick outlined in that previous email.


-- Jared
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "General" mailing list
http://lists.osafoundation.org/mailman/listinfo/general

Re: [General] Updated Hub usage metric - PPD perspective

Reply via email to