This mail is an automated notification from the task tracker
 of the project: Gna! Administration.

/**************************************************************************/
[task #119] Latest Modifications:

Changes by: 
                Mathieu Roy <[EMAIL PROTECTED]>
'Date: 
                mer 06.10.2004 à 18:41 (Europe/Paris)

------------------ Additional Follow-up Comments ----------------------------
I totally agree about the default style.  Feel free to improve :)

Running the backlog: I decided not to do it, considering that only the current 
access_log took 5 minutes to be processed.  Feel free to do it, but note that 
doing so requires to erase the current content. 

By per-project split, I assume you speak about splitting apache logs? I guess 
that could be easily achieved and, maybe, it will make the whole stuff going 
faster. But that's not sure; and having as many access_log as projects can 
be... painy too. Not sure that logrotate would enjoy.






/**************************************************************************/
[task #119] Full Item Snapshot:

URL: <http://gna.org/task/?func=detailitem&item_id=119>
Project: Gna! Administration
Submitted by: Mathieu Roy
On: lun 02.02.2004 à 02:00

Should Start On:  lun 02.02.2004 à 00:00
Should be Finished on:  jeu 02.12.2004 à 00:00
Category:  Services Functionalities
Priority:  5 - Normal
Resolution:  Done
Privacy:  Public
Assigned to:  yeupou
Percent Complete:  0%
Status:  Closed
Effort:  0.00


Summary:  webalizer statistics for download area + homepage per project

Original Submission:  We should provide "webalizer statistics for download area 
+ homepage per project"

Commentaires
------------------


-------------------------------------------------------
Date: mer 06.10.2004 à 18:41        By: Mathieu Roy <yeupou>
I totally agree about the default style.  Feel free to improve :)

Running the backlog: I decided not to do it, considering that only the current 
access_log took 5 minutes to be processed.  Feel free to do it, but note that 
doing so requires to erase the current content. 

By per-project split, I assume you speak about splitting apache logs? I guess 
that could be easily achieved and, maybe, it will make the whole stuff going 
faster. But that's not sure; and having as many access_log as projects can 
be... painy too. Not sure that logrotate would enjoy.

-------------------------------------------------------
Date: mer 06.10.2004 à 17:49        By: Vincent Caron <zerodeux>
It's a very nice start, now we have stats! What I see as improvements :



 - tune the default skin which is frankly awful

 - run it once on the whole backlog since we have everything since jan 10

 - per-project split



-------------------------------------------------------
Date: mer 06.10.2004 à 15:52        By: Mathieu Roy <yeupou>
I close this task. From a functional point of view, it is ok. But I guess it 
could be improved. We'll have to see how it scales, anyway.

I'm about to add a FAQ entry  + announce.

-------------------------------------------------------
Date: mer 06.10.2004 à 15:48        By: Mathieu Roy <yeupou>
Eveything is at stats.gna.org.

I also created a project stats (under X status, meaning it is blocked), to 
avoid any name clash, as you suggested.

-------------------------------------------------------
Date: mer 06.10.2004 à 15:44        By: Mathieu Roy <yeupou>
Ok it is working now. But it really really slow (even if it does not really 
take CPU resource, apparently) -- and it does not do dns lookups. Maybe it 
would run faster if apache logs where split on per project. But I suspect it's 
only the fact that we are forced to call webalizer nProjects * 2.

Maybe I missed something but I'm almost 100% sure there 's no other way if you 
want to generate stats on a per project basis.

But with apache per project splitted logs, we could save the call to webalizer 
when no recent log exist. That could make a difference, if lot of project are 
rarely accessed.

For the record:
Wed Oct  6 15:39:22 2004 - starting
Wed Oct  6 15:43:56 2004 - work finished


-------------------------------------------------------
Date: mer 06.10.2004 à 14:08        By: Mathieu Roy <yeupou>
The given ulrs does not work: I'm extending my stuff to the download system. 
Work in progress.

-------------------------------------------------------
Date: mer 06.10.2004 à 14:05        By: Vincent Caron <zerodeux>
Analog is interesting as of version 5.9x (alias 6 beta), producing very good 
XHTML output. But it's not deb packaged.



I was about to finally go with webalizer, because it's well known and packaged.



For the 'stats' project: it's only meant as a place holder, conveniently 
reserving the /stats folder in various places (at least home and download), we 
would not use the CVS since pages would be generated directly in place.



Technically, both analog and webalizer are fast (ie. takes a few seconds to 
parse a daily log from home.gna.org), I'm not worried about it right now.



They can distinguish vhosts, but not generate per 1st-level folder reports. We 
have to write a simple 'demux' which splits an access.log per /$project, then 
run one report for each. I was thinking to run in from the logrotate, but 
Apache rotations are programmed weekly and I don't want to change that. So we 
have to logtail the access log too.



Such a thing could be done in a generic manner and be easily installed in the 
right chroots (I don't like the way it is installed in core Bart, it will break 
things if we move chroots around).



BWT, it does not work for me : http://home.gna.org/stats/admin



-------------------------------------------------------
Date: mer 06.10.2004 à 13:13        By: Mathieu Roy <yeupou>
I have something working now for the homepage -> 
http://home.gna.org/stats/$project

like for instance 
http://home.gna.org/stats/admin

For the download area, I could more or less follow the same approach but I'd 
like to see first if it takes lot of cpu or not.

-------------------------------------------------------
Date: mer 06.10.2004 à 11:49        By: Mathieu Roy <yeupou>
From what I test with analog, it produces... HTML 2.0 and is a pain in the ass 
to configure.

Simple example:

[EMAIL PROTECTED] analog --help
This is analog version 5.23/Unix
For help see docs/Readme.html, or man analog, or http://www.analog.cx/

Everything is like that, I'm currently unable to know the configuration option 
to make it working only on one directory.

The output is really hard to read in my opinion (well, html 2.0 does not really 
help -- apparently if you want better output, you have to run a third party 
configuration software, which does not look attractive at all). So I think I'm 
going to install webalizer first, and once it will be running, it will still be 
possible to add analog as alternative

-------------------------------------------------------
Date: mer 06.10.2004 à 10:52        By: Mathieu Roy <yeupou>
Hum, I'm not sure it would make what we are expecting?

Creating stats will create a CVS managed directory home.gna.org/stats, while we 
probably do not want that.

Why not just writing a simple script like what is proposed below? I'm going to 
take a look to that matter today or tomorrow, since I currenly have some free 
time.



-------------------------------------------------------
Date: lun 04.10.2004 à 15:17        By: Vincent Caron <zerodeux>
I'd like to have a try this week. I propose to create the dummy project 
'stats', this way home.gna.org/stats and download.gna.org/stats will be 
reserved for the stat outputs (can go on with mail.gna.org/stats, etc). If it's 
fine for you, please create it.



-------------------------------------------------------
Date: jeu 23.09.2004 à 18:40        By: Mathieu Roy <yeupou>
Vincent, do you think you'll have time to look into Analog? Otherwise, we can, 
in the meantime, work to get webalizer stats.

-------------------------------------------------------
Date: sam 18.09.2004 à 18:18        By: Nicolas LAURENT <nicoo>
how to help to make things happen? 

-------------------------------------------------------
Date: mer 02.06.2004 à 12:38        By: Mathieu Roy <yeupou>
Feel free to go for it :)

-------------------------------------------------------
Date: mar 01.06.2004 à 15:45        By: Vincent Caron <zerodeux>
Webalizer:



 * Woody: 2.01.10, 2002/04/22

 * lang: C

 * deps: libgd

 * demo: http://demo.latinwebs.net/webalizer/

 * output: mostly HTML 4.0 compliant





Analog:



 * Woody: 5.23, 2002/05/18

 * lang: C

 * deps: libgd

 * demo: http://www.chiark.greenend.org.uk/~sret1/stats/

 * output: fully XHTML 1.0 compliant





Analog can draw good looking pseudo-bargraphs without invoking libgd which can 
save a lot on CPU cycles. I'd like to experiment with that one.



-------------------------------------------------------
Date: mar 01.06.2004 à 12:14        By: Vincent Caron <zerodeux>
RRD is just about graphing some data, it's the underlying tool for MRTG.



I've made a little experiment where I graphs per-site total hits, loaded pages 
and bandwidth ('hometraffic' script attached). The idea was to make it simple 
and light enough to be run every 5min in order to have a real-time measure just 
like with MRTG; I'm not very satisfied, it can grow CPU intensive in some cases.



Anyway, this gives us some code to demux per-site data, that we could feed to 
webalizer. The simplest way is to make the analysis during the Apache log 
rotation (this way we don't need logtail). I'd suggest to have it rotate twice 
a day for now.



BTW, I've been used to analog which is also technically very nice. It would be 
cool to make a few comparisons to see which one fits better.



-------------------------------------------------------
Date: ven 28.05.2004 à 15:47        By: Mathieu Roy <yeupou>
I'm not generally satisfied with rdd but the truth is the fact that I'm not 
very familiar with it. Can you generate data as meaningfull as webalizer's with 
rrd?



I was also interested in the idea of giving project their logs, not just 
statistics - but that's not a priority.

-------------------------------------------------------
Date: jeu 27.05.2004 à 18:02        By: Vincent Caron <zerodeux>
It would be very simple to feed a rrd base from the logs and have a per-project 
throughput graph. I just figured out that barely 20 lines of perl called from 
logrotate on access.log would do the trick. What do you think ?



-------------------------------------------------------
Date: mar 18.05.2004 à 17:20        By: Mathieu Roy <yeupou>
« The purpose seems to update web pages from cvs, is it ? »



Yes. It just access a text file giving the unix name of the project and then 
download the stuff.



The list of projects is generated by a script which one is outside of the 
chroot. It permit to have the www chroot running without mysql client and mysql 
access.



Apart from that, Gna! use the backend with nothing really specific. 



In the case of interest to us, the issue is not really related to Savane.



What the script just have to do is only related to apache and webalizer.



As I detailled before:





The best is to have a script that takes as argument on the command line:



--conffile= path to the configuration file like /var/webalizer/group.conf

--logfile= path the apache log, like /var/log/apache/bygroup/group.log

--outputdir= path to the output directory for webalizer

like /var/www/group/webalizer/

--title= title for the webalizer results



It would be easy then to adapt the script homepage-update.pl to call that 
script.



No other group info other than the system name is available. The home.gna.org 
system in a chroot that have no access to the database or anything else.



--



Another script (a trivial one) like homepage-update.pl should be written to run 
webalizer on each conffile, and that script will be run via cron. In fact, 
homepage-update.pl would only need very trivial changes for that.





-------------------------------------------------------
Date: mar 18.05.2004 à 16:40        By: David Jobet <djobet>
OK, I've read the file quickly. The purpose seems to update web pages from cvs, 
is it ?

Is there some documentation somewhere detailing the installation process 
between savane and gna ?

I mean, I've installed savane, but what else do I need to install ? (that's 
hard to try to guess what's the next step when you're in the dark).

-------------------------------------------------------
Date: mar 18.05.2004 à 16:20        By: Mathieu Roy <yeupou>
Hum, sorry I hadnt time to follow the issue.



The script is not part of savane (as it does not interact at all with the 
software savane) but of gna scripts. I attach to this item.



-------------------------------------------------------
Date: mar 18.05.2004 à 15:53        By: David Jobet <djobet>
I've installed and run savane (at least partially) on my home system, and I've 
posted to savane-dev but got no answer.

Perhaps was it not the good place.

However, where can I find the homepage-update.pl script you're talking about ?

-------------------------------------------------------
Date: sam 08.05.2004 à 20:00        By: Mathieu Roy <yeupou>
No, it is not complex to set up  webalizer. However, we need to make apache 
saving logs for each area in separate files. That's not a big task either, 
that's doable. But it still need to be done.

You are right, what would be need in a minimal script that create a webalizer 
conf (hum, it may even not be necessary. Just one conffile + command line args 
may be enough). 

The robot.txt blocking access to search engines is a good idea, indeed.

However, a script handling webalizer conf creation should not touch cron. There 
should be only one cron entry. Otherwise it would not be scalable.

--

How I understand your script, it would be called by the script that create the 
homepage area at http://home.gna.org 

--

The prefered language is Perl, as all the others scripts are in Perl.
The scripts must be in "use strict;" and should use "use Getopt::Long;" for 
command line arguments.

--

The best is to have a script that takes as argument on the command line:

  --conffile= path to the configuration file like /var/webalizer/group.conf
  --logfile= path the apache log, like /var/log/apache/bygroup/group.log
  --outputdir= path to the output directory for webalizer
like /var/www/group/webalizer/
  --title= title for the webalizer results

It would be easy then to adapt the script homepage-update.pl to call that 
script.

No other group info other than the system name is available. The home.gna.org 
system in a chroot that have no access to the database or anything else. 

--

Another script (a trivial one) like homepage-update.pl should be written to run 
webalizer on each conffile, and that script will be run via cron. In fact, 
homepage-update.pl would only need very trivial changes for that.



--

So if you can write the first script, it would be easy for us to include it in 
our bunch of scripts.

If you want, we can give you write access to the cvs repository, which is not 
public (but all the code is GPL).


-------------------------------------------------------
Date: ven 07.05.2004 à 10:36        By: David Jobet <djobet>
From my limited knowledge, setting up webalizer is not too complicated : we 
need to set up a webalizer.conf file, plus set up a cron job to launch 
webalizer once a day.



I guess we have to provide a tool (in which form ? perl ? bash ? other ?) that 
creates the conf file from the project info (in which form can we retrieve the 
project info ?).



One bothering task with webalizer is to regularly check the logs to add 
IgnoreReferrer on xxx site (they use webalizer as a way to increase their 
google ranking).

I guess we should add a robot.txt file forbidding the search engines to 
reference the webalizer page...



If you can tell me what kind of tool I can use and how I can get basic 
information on the project (such as the name, the path on the servers, ...) I 
can create a script that creates the 

- webalizer.conf

- add an entry in the cron



I think we will need to think if we want a referrer entry in the webalizer page 
(I think that's a cool feature because I can see who is talking/using of my 
project) and in that case how can we fight against porn sites.



See my last webalizer entry : http://www.nosica.net/webalizer/usage_200405.html




CC List
-------

CC Address                          | Comment
------------------------------------+-----------------------------
nicoo                               | any roadmap at this point?



Documents joints
-------------------

-------------------------------------------------------
Date: mar 01.06.2004 à 12:14  Name: hometraffic  Size: 4,75Ko   By: zerodeux

http://gna.org/task/download.php?item_id=119&item_file_id=16

-------------------------------------------------------
Date: mar 18.05.2004 à 16:20  Name: homepage-update.pl  Size: 2,13Ko   By: 
yeupou
homepage update
http://gna.org/task/download.php?item_id=119&item_file_id=13






For detailed info, follow this link:
<http://gna.org/task/?func=detailitem&item_id=119>

_______________________________________________
  Message posté via/par Gna!
  http://gna.org/


Reply via email to