On 10/25/02 5:47 PM Jeremy Wadsack ([EMAIL PROTECTED]) wrote: >Estimating visits, sessions or unique users is prone to error. It may >be "easy" but that doesn't mean it's right. It's possible to get a >very close estimate of sessions with some fancy heuristics, but even >that has at least a 10% margin of error.
Let me put it a slightly different way. Any system we might come up with to count visits, or users, would really be counting something else. HTTP doesn't track visit or user information and there is no way to recreate that information exactly. Other programs "solve" this by inventing a rule for counting and calling those things visits or users. For example a common rule is that a visit is a sequence of requests from a single host with no gap between requests of more than half an hour. That, however, has little if anything to do with what an (impossible) all knowing "true" analysis would call a visit. There are many many ways in which rules like that one miscount things. The most common problem is that AOL uses proxy clusters. Each "true visit" results in hits on your server from perhaps ten different hosts (all part of a single cluster). If most of your visitors are AOL users, your so called visit counts would be six to ten times what the "real" number would be. Similar problems come up with user counting. You can come up with more complex rules, and get closer to what the "real" numbers are, but even then you will never be able to get an exact count. Even the best rules can sometimes be way off, and there isn't any obvious way to know when that is happening. Despite the non-reality of numbers produced by rules like the one I gave above, many people find them useful. The absolute number might not mean anything, but it can be meaningful to do things like comparing this weeks count to last weeks count. This is where the political issues come in. We can never know users or visits, but we can know the value of various synthetic measures, for example the ones that other programs misleadingly call users or visits. Is the value of having the synthetic numbers greater than the confusion they cause? While they are useful in carefully chosen contexts, they can be incredibly misleading if presented to less knowledgeable people as if they were the real thing. I'll end with a little story. I had a client who really really wanted visit counts. So I got them a program which produced visit counts using the rule I gave above. The client went on and on about wanting the most accurate possible information, so when I found another program that produced more accurate numbers I switched them to that. Unfortunately they were very upset with the more accurate numbers. Because the new program corrected for AOL proxy clusters their visit counts were one fifth of what they used to be. Since they appeared to management to have lost four fifths of their audience, their project was canceled. I can't say that the moral of this story is completely clear, but it does show one of the possible risks of using synthetic numbers and representing them as if they were real. Jason ----------------- [EMAIL PROTECTED] ----------------- Dr. Seuss books . . . can be read and enjoyed on several levels. For example, 'One Fish Two Fish, Red Fish Blue Fish' can be deconstructed as a searing indictment of the narrow-minded binary counting system. -- Peter van der Linden, Expert C Programming, Deep C Secrets +------------------------------------------------------------------------ | This is the analog-help mailing list. To unsubscribe from this | mailing list, go to | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | List archives are available at | http://www.mail-archive.com/analog-help@;lists.isite.net/ | http://lists.isite.net/listgate/analog-help/archives/ | http://www.tallylist.com/archives/index.cfm/mlist.7 +------------------------------------------------------------------------