At the top of:

  http://dashboard.osafoundation.org/dashboard

is a new metric: "Daily Hub visitors by Application". The rest of this email describes what the metric actually measures, its limitations, and how best to proceed towards getting an understanding of Chandler's "regular users".

+ Hub usage in 5 buckets

This metric proposes to break down all incoming Hub traffic into 5 buckets:

- Chandler Desktop
- Web browser
- Mozilla Calendar(s) (Lightning, Sunbird)
- iCal 3.x
- Other (everything else)

+ Metric details and examples

How does this metric work? I count up all the IP + "HTTP User Agent" pairs I see in the Chandler Server logs. This gives a list like:

94.199.224.144 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6

201.51.72.4 Chandler/0.7.1 (Windows; U; i386; pt_BR)

201.78.236.17 Chandler/0.7.0.1 (Windows; U; i386; pt_BR)

204.15.0.186 Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.6) Gecko/20070728 Thunderbird/2.0.0.6

204.50.113.28 Chandler/0.7.2-rc1 (Windows; U; i386; en_CA)

206.81.98.138 Chandler/0.7.0.1 (Windows; U; i386; en_US)

207.237.138.203 DAVKit/2.0 (10.5; wrbt) iCal 3.0

207.237.178.127 Chandler/0.7.1 (Macintosh; U; i386; en_US)

207.88.3.150 DAVKit/2.0 (10.5; wrbt) iCal 3.0

207.88.3.150 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9

So then I go through this list, and analyze all the user-agents into the buckets shown on the graph. The "other" bucket is things we recognize explicitly but don't have their own bucket. This includes iCal 2, Evolution, NetNewsWire, and a big list of long-tail apps.

+ Undercounting and double-counting

If we don't recognize it, it's not counted. Same with robots/web-spiders. This traits means we are undercounting a little bit on total usage, and new clients aren't counted until I add them to the logic.

Since we include IP address in the pair, we will undercount multiple people using the same client behind a firewall, etc.

We will double-count people using the same client at home and at work. We will double-count people using both Chandler and a web browser from the same machine (there are a fair number of these). We will double-count people who upgraded their apps so the version number changed through the day.

We will undercount anyone using Chandler Server that's not on Chandler Hub. We will undercount anyone using Chandler Desktop but not syncing to the Hub.

If you use an app once in a day or 500 times, you get one "hit" in the above metrics. The goal is to try to count "people"; IP+User-agent is serving as a proxy for that. Due to our designs, there's no real way to link people using Chandler Desktop to those following a ticketed URL to the Hub, for instance.

How does this metric fall short? In lots of ways. What we'd really like to understand are classic marketing dimensions of recency, frequency, and depth of interaction. In particular, we should establish a better way to understand our "regular" users. We have no functional definition of "regular" plus no good way to measure it if we did have it defined.

+ Writes vs reads as proxy for "regular user"?

One way to define "regular" user might be "writes vs reads". Intuitively, a regular user might make changes to an event/todo, while a passive user might just view occasionally or just sync in the background. So if we had a way to separate "those people who have made a change this week" from "people who made no changes", then we might have the beginnings of a "regular user" vs "casual user" metrics.

I'd be curious to hear reactions to the "regular == makes changes" idea. It might be an interesting metric to watch, but I worry that it might exclude lots of great users getting substantial value from the Chandler Project but who don't make lots of changes to their lists.

+ Next steps?

I suspect I'm getting closer to having wrung out the info available to me in current Chandler Server log files. It's not too hard to think of great next-step metrics to measure, but the ideas need to be finessed into what's actually feasible, and translated into an implementable feature that's incrementally better than what we have now. In particular, we'll probably proceed to work on a Chandler Server feature to note what actions are actually occurring in the MC and Atom protocols. (Number of items changed, for instance).

+ Big dips in usage are a measurement artifact

A note on days like Nov 3rd and Oct 31st where the traffic seems to plummet: this is an artifact of the measurement, not real. Because of a failure to plan properly on my part, my metric analysis code only properly works with 1 file per day. On days when I update the Hub, multiple files are generated in production; only 1 gets counted. It is obviously possible to fix this issue, but it will require a significant amount of refactoring. It's on the plan, but a relnote for now. My apologies.

-- Jared
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "General" mailing list
http://lists.osafoundation.org/mailman/listinfo/general

Reply via email to