Re: [Tutor] Web Stats
Kent Johnson wrote: On Wed, Jun 11, 2008 at 3:58 PM, Stephen Nelson-Smith [EMAIL PROTECTED] wrote: Hello, This has to include resources which have not been visited, as the point is to clean out old stuff. wouldn't a 'find' for files with a an ancient access time be a better way of finding out? Also, http://ch.tudelft.nl/~arthur/webcheck/ is useful if you have the site up and running. It will give you some statistics about broken pages and old ones. -- ~noufal http://nibrahim.net.in/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Web Stats
On Wed, Jun 11, 2008 at 3:58 PM, Stephen Nelson-Smith [EMAIL PROTECTED] wrote: Hello, This has to include resources which have not been visited, as the point is to clean out old stuff. Ah, I missed that part. Take a look at AWStats (not Python). Doesn't this 'only' parse weblogs? It parses them and displays the stats. But it is focused on presentation, probably not what you need. I'd still need some kind of spider to tell me all the possible resources available wouldn't I? It's a big website, with 1000s of pages. I guess, unless you can figure it out from the backend somehow. For example if it's all static files you can walk the file system instead of the site. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Web Stats
On Jun 11, 2008, at 12:58 PM, Stephen Nelson-Smith wrote: Take a look at AWStats (not Python). Doesn't this 'only' parse weblogs? I'd still need some kind of spider to tell me all the possible resources available wouldn't I? It's a big website, with 1000s of pages. If you have pages which are no longer referenced from any root pages then a spider won't find them. These dangling pages are precisely the sort of thing you're trying to remove. Consider other options such as looking through the filesystem. - Jeff Younker - [EMAIL PROTECTED] - ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Web Stats
Hi, I've been asked to produce a report showing all possible resources in a website, together with statistics on how frequently they've been visited. Nothing fancy - just number and perhaps date of last visit. This has to include resources which have not been visited, as the point is to clean out old stuff. I have several years of apache weblogs. Is there something out there that already does this? If not, or if it's interesting and not beyond the ken of a reasonable programmer, could anyone provide some pointers on where to start? Thanks, S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Web Stats
Hello, This has to include resources which have not been visited, as the point is to clean out old stuff. Take a look at AWStats (not Python). Doesn't this 'only' parse weblogs? I'd still need some kind of spider to tell me all the possible resources available wouldn't I? It's a big website, with 1000s of pages. For do it yourself, loghetti might be a good starting point http://code.google.com/p/loghetti/ Looks interesting, but again don't I fall foul of the how can I know about what, by definition, doesn't feature in a log? problem? S. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor