Re: [Tutor] Web Stats

2008-06-15 Thread Noufal Ibrahim

Kent Johnson wrote:

On Wed, Jun 11, 2008 at 3:58 PM, Stephen Nelson-Smith
[EMAIL PROTECTED] wrote:

Hello,


 This has to include resources which have not been visited, as the
point is to clean out old stuff.


wouldn't a 'find' for files with a an ancient access time be a better 
way of finding out? Also, http://ch.tudelft.nl/~arthur/webcheck/ is 
useful if you have the site up and running. It will give you some 
statistics about broken pages and old ones.



--
~noufal
http://nibrahim.net.in/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Web Stats

2008-06-11 Thread Kent Johnson
On Wed, Jun 11, 2008 at 3:58 PM, Stephen Nelson-Smith
[EMAIL PROTECTED] wrote:
 Hello,

  This has to include resources which have not been visited, as the
 point is to clean out old stuff.

Ah, I missed that part.

 Take a look at AWStats (not Python).

 Doesn't this 'only' parse weblogs?

It parses them and displays the stats. But it is focused on
presentation, probably not what you need.

 I'd still need some kind of spider
 to tell me all the possible resources available wouldn't I?  It's a
 big website, with 1000s of pages.

I guess, unless you can figure it out from the backend somehow. For
example if it's all static files you can walk the file system instead
of the site.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Web Stats

2008-06-11 Thread Jeff Younker

On Jun 11, 2008, at 12:58 PM, Stephen Nelson-Smith wrote:

Take a look at AWStats (not Python).


Doesn't this 'only' parse weblogs?  I'd still need some kind of spider
to tell me all the possible resources available wouldn't I?  It's a
big website, with 1000s of pages.


If you have pages which are no longer referenced from any root
pages then a spider won't find them.  These dangling pages
are precisely the sort of thing you're trying to remove.   Consider
other options such as looking through the filesystem.

- Jeff Younker - [EMAIL PROTECTED] -

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Web Stats

2008-06-11 Thread Stephen Nelson-Smith
Hi,

I've been asked to produce a report showing all possible resources in
a website, together with statistics on how frequently they've been
visited.  Nothing fancy - just number and perhaps date of last visit.
 This has to include resources which have not been visited, as the
point is to clean out old stuff.

I have several years of apache weblogs.

Is there something out there that already does this?  If not, or if
it's interesting and not beyond the ken of a reasonable programmer,
could anyone provide some pointers on where to start?

Thanks,

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Web Stats

2008-06-11 Thread Stephen Nelson-Smith
Hello,

  This has to include resources which have not been visited, as the
 point is to clean out old stuff.

 Take a look at AWStats (not Python).

Doesn't this 'only' parse weblogs?  I'd still need some kind of spider
to tell me all the possible resources available wouldn't I?  It's a
big website, with 1000s of pages.

 For do it yourself, loghetti
 might be a good starting point
 http://code.google.com/p/loghetti/

Looks interesting, but again don't I fall foul of the how can I know
about what, by definition, doesn't feature in a log? problem?

S.
 Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor