[General] Updated Hub usage metric

Jared Rhine Mon, 12 Nov 2007 10:56:49 -0800

At the top of:

  http://dashboard.osafoundation.org/dashboard

is a new metric: "Daily Hub visitors by Application". The rest of thisemail describes what the metric actually measures, its limitations, andhow best to proceed towards getting an understanding of Chandler's"regular users".


+ Hub usage in 5 buckets

This metric proposes to break down all incoming Hub traffic into 5 buckets:

- Chandler Desktop
- Web browser
- Mozilla Calendar(s) (Lightning, Sunbird)
- iCal 3.x
- Other (everything else)

+ Metric details and examples

How does this metric work? I count up all the IP + "HTTP User Agent"pairs I see in the Chandler Server logs. This gives a list like:

94.199.224.144 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6


201.51.72.4 Chandler/0.7.1 (Windows; U; i386; pt_BR)

201.78.236.17 Chandler/0.7.0.1 (Windows; U; i386; pt_BR)

204.15.0.186 Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.6)Gecko/20070728 Thunderbird/2.0.0.6


204.50.113.28 Chandler/0.7.2-rc1 (Windows; U; i386; en_CA)

206.81.98.138 Chandler/0.7.0.1 (Windows; U; i386; en_US)

207.237.138.203 DAVKit/2.0 (10.5; wrbt) iCal 3.0

207.237.178.127 Chandler/0.7.1 (Macintosh; U; i386; en_US)

207.88.3.150 DAVKit/2.0 (10.5; wrbt) iCal 3.0

207.88.3.150 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US;rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9

So then I go through this list, and analyze all the user-agents into thebuckets shown on the graph. The "other" bucket is things we recognizeexplicitly but don't have their own bucket. This includes iCal 2,Evolution, NetNewsWire, and a big list of long-tail apps.


+ Undercounting and double-counting

If we don't recognize it, it's not counted. Same withrobots/web-spiders. This traits means we are undercounting a little biton total usage, and new clients aren't counted until I add them to thelogic.

Since we include IP address in the pair, we will undercount multiplepeople using the same client behind a firewall, etc.

We will double-count people using the same client at home and at work.We will double-count people using both Chandler and a web browser fromthe same machine (there are a fair number of these). We willdouble-count people who upgraded their apps so the version numberchanged through the day.

We will undercount anyone using Chandler Server that's not on ChandlerHub. We will undercount anyone using Chandler Desktop but not syncingto the Hub.

If you use an app once in a day or 500 times, you get one "hit" in theabove metrics. The goal is to try to count "people"; IP+User-agent isserving as a proxy for that. Due to our designs, there's no real way tolink people using Chandler Desktop to those following a ticketed URL tothe Hub, for instance.

How does this metric fall short? In lots of ways. What we'd reallylike to understand are classic marketing dimensions of recency,frequency, and depth of interaction. In particular, we should establisha better way to understand our "regular" users. We have no functionaldefinition of "regular" plus no good way to measure it if we did have itdefined.


+ Writes vs reads as proxy for "regular user"?

One way to define "regular" user might be "writes vs reads".Intuitively, a regular user might make changes to an event/todo, while apassive user might just view occasionally or just sync in thebackground. So if we had a way to separate "those people who have madea change this week" from "people who made no changes", then we mighthave the beginnings of a "regular user" vs "casual user" metrics.

I'd be curious to hear reactions to the "regular == makes changes" idea.It might be an interesting metric to watch, but I worry that it mightexclude lots of great users getting substantial value from the ChandlerProject but who don't make lots of changes to their lists.


+ Next steps?

I suspect I'm getting closer to having wrung out the info available tome in current Chandler Server log files. It's not too hard to think ofgreat next-step metrics to measure, but the ideas need to be finessedinto what's actually feasible, and translated into an implementablefeature that's incrementally better than what we have now. Inparticular, we'll probably proceed to work on a Chandler Server featureto note what actions are actually occurring in the MC and Atomprotocols. (Number of items changed, for instance).


+ Big dips in usage are a measurement artifact

A note on days like Nov 3rd and Oct 31st where the traffic seems toplummet: this is an artifact of the measurement, not real. Because of afailure to plan properly on my part, my metric analysis code onlyproperly works with 1 file per day. On days when I update the Hub,multiple files are generated in production; only 1 gets counted. It isobviously possible to fix this issue, but it will require a significantamount of refactoring. It's on the plan, but a relnote for now. Myapologies.


-- Jared
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "General" mailing list
http://lists.osafoundation.org/mailman/listinfo/general

[General] Updated Hub usage metric

Reply via email to