On 10/25/02 5:47 PM Jeremy Wadsack ([EMAIL PROTECTED]) wrote:

>Estimating visits, sessions or unique users is prone to error. It may
>be "easy" but that doesn't mean it's right. It's possible to get a
>very close estimate of sessions with some fancy heuristics, but even
>that has at least a 10% margin of error.

Let me put it a slightly different way. Any system we might come up with 
to count visits, or users, would really be counting something else. HTTP 
doesn't track visit or user information and there is no way to recreate 
that information exactly.

Other programs "solve" this by inventing a rule for counting and calling 
those things visits or users. For example a common rule is that a visit 
is a sequence of requests from a single host with no gap between requests 
of more than half an hour. That, however, has little if anything to do 
with what an (impossible) all knowing "true" analysis would call a visit.

There are many many ways in which rules like that one miscount things. 
The most common problem is that AOL uses proxy clusters. Each "true 
visit" results in hits on your server from perhaps ten different hosts 
(all part of a single cluster). If most of your visitors are AOL users, 
your so called visit counts would be six to ten times what the "real" 
number would be. Similar problems come up with user counting.

You can come up with more complex rules, and get closer to what the 
"real" numbers are, but even then you will never be able to get an exact 
count. Even the best rules can sometimes be way off, and there isn't any 
obvious way to know when that is happening.

Despite the non-reality of numbers produced by rules like the one I gave 
above, many people find them useful. The absolute number might not mean 
anything, but it can be meaningful to do things like comparing this weeks 
count to last weeks count.

This is where the political issues come in. We can never know users or 
visits, but we can know the value of various synthetic measures, for 
example the ones that other programs misleadingly call users or visits. 
Is the value of having the synthetic numbers greater than the confusion 
they cause? While they are useful in carefully chosen contexts, they can 
be incredibly misleading if presented to less knowledgeable people as if 
they were the real thing.

I'll end with a little story. I had a client who really really wanted 
visit counts. So I got them a program which produced visit counts using 
the rule I gave above. The client went on and on about wanting the most 
accurate possible information, so when I found another program that 
produced more accurate numbers I switched them to that. Unfortunately 
they were very upset with the more accurate numbers. Because the new 
program corrected for AOL proxy clusters their visit counts were one 
fifth of what they used to be. Since they appeared to management to have 
lost four fifths of their audience, their project was canceled. I can't 
say that the moral of this story is completely clear, but it does show 
one of the possible risks of using synthetic numbers and representing 
them as if they were real.

Jason

-----------------
[EMAIL PROTECTED]
-----------------
Dr. Seuss books . . . can be read and enjoyed on several levels. For
example, 'One Fish Two Fish, Red Fish Blue Fish' can be deconstructed
as a searing indictment of the narrow-minded binary counting system.
  -- Peter van der Linden, Expert C Programming, Deep C Secrets


+------------------------------------------------------------------------
|  This is the analog-help mailing list. To unsubscribe from this
|  mailing list, go to
|    http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  List archives are available at
|    http://www.mail-archive.com/analog-help@;lists.isite.net/
|    http://lists.isite.net/listgate/analog-help/archives/
|    http://www.tallylist.com/archives/index.cfm/mlist.7
+------------------------------------------------------------------------

Reply via email to