At 15:22 13.02.2002 -0800, Ryan Parr wrote:
>Nothing special to the way these sites work. You can check out
>http://www.rileyjames.com and http://www.ryanparr.com (the programming on
>the latter will leave you in awe :) I want to host my sites and have a
>decent usage statistics location, but I just can't seem to get the logging
>part down. I've got a long road ahead of me :)
>
>For instance, the code below logs the following on entrance to
>rileyjames.com (setup as PerlFixupHandler):
>www.rileyjames.com      /       Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /index.html     Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /topnavigation.htm      Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /white.htm      Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /green.htm      Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /index1.htm     Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /topnav.css     Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /graphics/redarrow.gif  Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /border.css     Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /text.css       Wed Feb 13 16:17:15 2002
>www.rileyjames.com      /graphics/frontpaglogo.gif      Wed Feb 13 16:17:15
>2002


The problem you seem to be having is that:
1) The client is sent the main page as HTML (index.html)
2) As this file includes many references to other URLs, for images, CSS, 
frames, etc.., the client knows that it'll need these files, so sends out 
new requests for these files, many of them at the same time.
3) Apache processes these new requests, without knowing that they came from 
one other request.

You're faced with one problem (and feature) of the HTTP protocol: it's 
stateless, so the httpd could not possibly know that any requests are linked.
You have some ways of working around this, though. It's been tried over and 
over again, and as many people know, getting reliable statistics on visits 
(etc) is pretty hard. Here are some possible solutions:
1) as you're using frames on rileyjames.com, you could log only visits on 
/topnavigation.htm, which would be loaded only once. Of course, logging the 
number of visits is not really what you want.
2) Say that one IP can only be counted visiting when it visits within a 
certain amount of time: for example, all visits after the first one from a 
specific IP are ignored for 5 seconds.. One problem here is that:
         - IPs aren't reliable enough as a method (there is no IP-computer 
match, because of NAT and proxies)
         - You might not have reached the logging phase of the first page 
when the other pages are requested (although this is unlikely)
3) What I think is the best solution: declare only some pages as loggable. 
Either log only specific pages, say the HTML files of your choice and some 
big pictures, *or* add a query string to the pages you want logged/don't 
want logged...
Say: /graphics/frontpaglogo.gif?log=yes would still get you the image, but 
you can get the query string in the logger, and check whether to log or not.

There are probably many other solutions... But just remember that while the 
line return DECLINED unless($r->is_main()); is useful for subrequests, it 
won't help you a bit in your situation here, because of the fact that the 
requests you're seeing are indeed separate.



-- 
Per Einar Ellefsen
[EMAIL PROTECTED]

Reply via email to