Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-09-10 Thread Olivier Berger
On Fri, Jul 23, 2010 at 08:44:18PM +0200, Jonas Smedegaard wrote:
 On Fri, Jul 23, 2010 at 09:54:23PM +0400, Sergey B Kirpichev wrote:
 
 I guess because I haven't written a proper config file yet?
 Anyway, it's still spamming my syslog *every 10 minutes*. This
 should at least be an option that's off by default.
 
 It's a fresh install, right?
 
 On Fri, Jul 23, 2010 at 9:39 PM, Jonas Smedegaard jo...@jones.dk wrote:
 I experienced cron spam too when trying to install awstats
 recently (and too busy at the time to investigate further - just
 cursed a bit and uninstalled awstats again).
 
 Possibly not a helpful comment - just want to hint that there
 might actually be an issue of cron spam in virgin installs of
 awstats currently.
 
 I guess, we can disable cron jobs by default on a fresh install.
 As /etc/awstats/awstats.conf is not configured by default,
 
 A virgin install must not cause cron spam.  If you implicitly
 acknowledge above that awstats currently does, then yes, we should
 disable it by default (or figure out something more clever).
 

I'm also experiencing such spamming, and it gets even worse as it runs as 
www-data, and there's no /etc/aliases redirecting it to a real user by default 
in exim, it seems.

So there's an ever growing mailbox in /var/spool/mail/www-data :-(

See #496029 that seems to relate to the aliases problem.

Still this needs to be addressed on awstats side too, I guess.

Hope this helps.

Best regards,



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Ximin Luo
Package: awstats
Version: 6.9.5~dfsg-2
Severity: important

Currently this package installs a cron job that runs every ten minutes. This
is a VERY bad idea:

- if logrotate(8) runs during those 10 minutes, some log entries will fail to
  be accounted for by awstats

- it wastes resources parsing the same log files every 10 minutes, especially
  if they get big

- it makes logcheck(8) spam my inbox every hour due to the cron job failing
  every 10 minutes

A better solution is to hook the update script onto the logrotate(8) entries
for any installed webservers (eg. /etc/logrotate.d/lighttpd,apache2). This
solves all of the 3 problems I just mentioned.


-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages awstats depends on:
ii  perl  5.10.1-13  Larry Wall's Practical Extraction 

Versions of packages awstats recommends:
ii  libnet-xwhois-perl0.90-3 Whois Client Interface for Perl5

Versions of packages awstats suggests:
pn  libgeo-ipfree-perlnone (no description available)
ii  libnet-dns-perl   0.66-2 Perform DNS queries from a Perl sc
ii  libnet-ip-perl1.25-2 Perl extension for manipulating IP
ii  liburi-perl   1.54-1 module to manipulate and access UR
ii  lighttpd [httpd]  1.4.26-3   A fast webserver with minimal memo

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: [Pkg-awstats-devel] Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Jonas Smedegaard

Hi Ximin,

On Fri, Jul 23, 2010 at 01:31:57PM +0100, Ximin Luo wrote:

Currently this package installs a cron job that runs every ten minutes. 
This is a VERY bad idea:


- if logrotate(8) runs during those 10 minutes, some log entries will fail to
 be accounted for by awstats

- it wastes resources parsing the same log files every 10 minutes, especially
 if they get big

- it makes logcheck(8) spam my inbox every hour due to the cron job failing
 every 10 minutes

A better solution is to hook the update script onto the logrotate(8) entries
for any installed webservers (eg. /etc/logrotate.d/lighttpd,apache2). This
solves all of the 3 problems I just mentioned.


Good points!

Frequent updates of logfiles have its use too, however.  But not always 
- and the backsides you raise here are valid.


I suggest to a) split the current cron job into infrequent and frequent 
jobs, b) make the frequent one optional (ideally through debconf), and 
c) invoke the infrequent job also (or instead?) as a logrotate hook.



How does that sound?


 - Jonas

--
 * Jonas Smedegaard - idealist  Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private


signature.asc
Description: Digital signature


Bug#590074: [Pkg-awstats-devel] Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Ximin Luo

On 23/07/10 14:31, Jonas Smedegaard wrote:

I suggest to a) split the current cron job into infrequent and frequent
jobs, b) make the frequent one optional (ideally through debconf), and
c) invoke the infrequent job also (or instead?) as a logrotate hook.


How does that sound?


Yeah, that works. Though looking at the current script, there's not really much 
need to split it up.


The main problem is to do the logrotate hook itself - ideally you'd add it 
directly to the webserver entry rather than a new awstats entry. Is that going 
to be a pain - editing another package's configuration files? I dunno what 
infrastructure / policy Debian has for this sort of thing.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: [Pkg-awstats-devel] Bug#590074: Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Jonas Smedegaard

On Fri, Jul 23, 2010 at 02:57:43PM +0100, Ximin Luo wrote:

On 23/07/10 14:31, Jonas Smedegaard wrote:
I suggest to a) split the current cron job into infrequent and 
frequent jobs, b) make the frequent one optional (ideally through 
debconf), and c) invoke the infrequent job also (or instead?) as a 
logrotate hook.



How does that sound?


Yeah, that works. Though looking at the current script, there's not 
really much need to split it up.


The main problem is to do the logrotate hook itself - ideally you'd add 
it directly to the webserver entry rather than a new awstats entry. Is 
that going to be a pain - editing another package's configuration 
files? I dunno what infrastructure / policy Debian has for this sort of 
thing.


Ah, yes.  That was it: Not doable!

Or more accurately: It needs implemented in webserver packages that this 
awstats package can hook into - not doable in awstats alone.



 - Jonas

--
 * Jonas Smedegaard - idealist  Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private


signature.asc
Description: Digital signature


Bug#590074: [Pkg-awstats-devel] Bug#590074: Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Ximin Luo

clone 590074 -1 -2 -3
reassign -1 apache2
reassign -2 lighttpd
reassign -3 nginx
retitle -1 add pre-rotate hook to logrotate script
retitle -2 add pre-rotate hook to logrotate script
retitle -3 add pre-rotate hook to logrotate script
severity -1 wishlist
severity -2 wishlist
severity -3 wishlist
thanks

Please add something like the following snippet to the logrotate script for 
your package:


prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi; \
endscript

(or some suitable directory other than the one suggested; I'm not sure what 
Debian naming conventions are.)


This would be greatly helpful to log-parsing packages such as awstats, which 
can then set up hooks to processes these logs before they get rotated (see 
#590074 for an example).


X

On 23/07/10 17:03, Jonas Smedegaard wrote:

On Fri, Jul 23, 2010 at 02:57:43PM +0100, Ximin Luo wrote:

On 23/07/10 14:31, Jonas Smedegaard wrote:

I suggest to a) split the current cron job into infrequent and
frequent jobs, b) make the frequent one optional (ideally through
debconf), and c) invoke the infrequent job also (or instead?) as a
logrotate hook.


How does that sound?


Yeah, that works. Though looking at the current script, there's not
really much need to split it up.

The main problem is to do the logrotate hook itself - ideally you'd
add it directly to the webserver entry rather than a new awstats
entry. Is that going to be a pain - editing another package's
configuration files? I dunno what infrastructure / policy Debian has
for this sort of thing.


Ah, yes. That was it: Not doable!

Or more accurately: It needs implemented in webserver packages that this
awstats package can hook into - not doable in awstats alone.


- Jonas






--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: [Pkg-awstats-devel] Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Sergey B Kirpichev
On Fri, Jul 23, 2010 at 4:31 PM, Ximin Luo infini...@gmx.com wrote:
 Currently this package installs a cron job that runs every ten minutes. This
 is a VERY bad idea:

 - if logrotate(8) runs during those 10 minutes, some log entries will fail to
  be accounted for by awstats

logrotate every 10 minutes - could be the source of trouble.  Not awstats.

 - it wastes resources parsing the same log files every 10 minutes, especially
  if they get big

Do you mean, it parses _same_ log entires?  No, awstats doesn't do
such a stupid things.  Actually, it does lseek on file to the last
known entry and then begin parsing.

 - it makes logcheck(8) spam my inbox every hour due to the cron job failing
  every 10 minutes

Why exactly it fails?  Do you try first to comment out crontab entry
and fix the source of failure?

 A better solution is to hook the update script onto the logrotate(8) entries
 for any installed webservers (eg. /etc/logrotate.d/lighttpd,apache2). This
 solves all of the 3 problems I just mentioned.

 Package: awstats
 Version: 6.9.5~dfsg-2
 Severity: important

I'm disagree with severity.  Looks like a very
site-specific/workload-specific issue.  Your logrotate-based solution
could be suggested as an option in README.Debian for specific setups.

On Fri, Jul 23, 2010 at 5:31 PM, Jonas Smedegaard jo...@jones.dk wrote:
 Frequent updates of logfiles have its use too, however.  But not always -
 and the backsides you raise here are valid.

True.

 I suggest to a) split the current cron job into infrequent and frequent
 jobs, b) make the frequent one optional (ideally through debconf), and c)
 invoke the infrequent job also (or instead?) as a logrotate hook.

How to split a) or c)?  It's easy only from the local admin side.

We can make cron job frequency to be debconfigured.  Is it an option?



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: [Pkg-awstats-devel] Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Ximin Luo

On 23/07/10 17:46, Sergey B Kirpichev wrote:

On Fri, Jul 23, 2010 at 4:31 PM, Ximin Luoinfini...@gmx.com  wrote:

Currently this package installs a cron job that runs every ten minutes. This
is a VERY bad idea:

- if logrotate(8) runs during those 10 minutes, some log entries will fail to
  be accounted for by awstats


logrotate every 10 minutes - could be the source of trouble.  Not awstats.



No, logrotate isn't running every 10 minutes. I think you misunderstood my 
point. If logrotate runs between the 10-minute cron runs of awstats, it will 
rotate the log entries since the last 10-minute run, so the next 10-minute run 
won't be able to see it any more.



- it wastes resources parsing the same log files every 10 minutes, especially
  if they get big


Do you mean, it parses _same_ log entires?  No, awstats doesn't do
such a stupid things.  Actually, it does lseek on file to the last
known entry and then begin parsing.



What if the file has been truncated or removed by logrotate?


- it makes logcheck(8) spam my inbox every hour due to the cron job failing
  every 10 minutes


Why exactly it fails?  Do you try first to comment out crontab entry
and fix the source of failure?



I guess because I haven't written a proper config file yet? Anyway, it's still 
spamming my syslog *every 10 minutes*. This should at least be an option that's 
off by default.



A better solution is to hook the update script onto the logrotate(8) entries
for any installed webservers (eg. /etc/logrotate.d/lighttpd,apache2). This
solves all of the 3 problems I just mentioned.



Package: awstats
Version: 6.9.5~dfsg-2
Severity: important


I'm disagree with severity.  Looks like a very
site-specific/workload-specific issue.  Your logrotate-based solution
could be suggested as an option in README.Debian for specific setups.



logrotate is part of the standard install for Debian webservers (at least 
apache2 and lighttpd). this is not site specific.



On Fri, Jul 23, 2010 at 5:31 PM, Jonas Smedegaardjo...@jones.dk  wrote:

Frequent updates of logfiles have its use too, however.  But not always -
and the backsides you raise here are valid.


True.


I suggest to a) split the current cron job into infrequent and frequent
jobs, b) make the frequent one optional (ideally through debconf), and c)
invoke the infrequent job also (or instead?) as a logrotate hook.


How to split a) or c)?  It's easy only from the local admin side.

We can make cron job frequency to be debconfigured.  Is it an option?





--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: [Pkg-awstats-devel] Bug#590074: Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Jonas Smedegaard

On Fri, Jul 23, 2010 at 08:46:27PM +0400, Sergey B Kirpichev wrote:

On Fri, Jul 23, 2010 at 4:31 PM, Ximin Luo infini...@gmx.com wrote:
Currently this package installs a cron job that runs every ten 
minutes. This is a VERY bad idea:


- if logrotate(8) runs during those 10 minutes, some log entries will 
fail to  be accounted for by awstats


logrotate every 10 minutes - could be the source of trouble.  Not 
awstats.


Looks like possible language problem:

 during != every

 during ~= in between

If this didn't help, please follow-up on Ximin's response instead of 
mine :-)



- it makes logcheck(8) spam my inbox every hour due to the cron job 
failing  every 10 minutes


Why exactly it fails?  Do you try first to comment out crontab entry 
and fix the source of failure?


I experienced cron spam too when trying to install awstats recently (and 
too busy at the time to investigate further - just cursed a bit and 
uninstalled awstats again).


Possibly not a helpful comment - just want to hint that there might 
actually be an issue of cron spam in virgin installs of awstats 
currently.




On Fri, Jul 23, 2010 at 5:31 PM, Jonas Smedegaard jo...@jones.dk 
wrote:


I suggest to a) split the current cron job into infrequent and 
frequent jobs, b) make the frequent one optional (ideally through 
debconf), and c) invoke the infrequent job also (or instead?) as a 
logrotate hook.


How to split a) or c)?  It's easy only from the local admin side.


I must admit that I have lost track of most recent improvements by you, 
but seem to recall in the past that it made sense for my local scripts 
to distinguish between hevier monthly/weekly log analysis routines and 
smaller hourly ones.  But perhaps that was because I (for other reasons) 
analyzed the files from scratch again each month...


Let's first figure out if current frequent cron job really is heavy on 
system resources, and only if it is I can try elaborate more on my ideas 
here.




We can make cron job frequency to be debconfigured.  Is it an option?


I would prefer to keep the cron file as a conffile and instead have the 
invoked script check a flag in /etc/default/awstats if it should really 
run or just quit immediately.


But again, let's first resolve if it really is necessary.


 - Jonas

--
 * Jonas Smedegaard - idealist  Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private


signature.asc
Description: Digital signature


Bug#590074: [Pkg-awstats-devel] Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Sergey B Kirpichev
 No, logrotate isn't running every 10 minutes. I think you misunderstood my
 point. If logrotate runs between the 10-minute cron runs of awstats, it will
 rotate the log entries since the last 10-minute run, so the next 10-minute
 run won't be able to see it any more.

That's true.  But it's a known issue and your logrotate hint is
already documented in README.Debian for this purpose.

On Fri, Jul 23, 2010 at 9:39 PM, Jonas Smedegaard jo...@jones.dk wrote:
 Looks like possible language problem:
  during != every
  during ~= in between

Probably ;)

 - it wastes resources parsing the same log files every 10 minutes,
 especially
  if they get big

 Do you mean, it parses _same_ log entires?  No, awstats doesn't do
 such a stupid things.  Actually, it does lseek on file to the last
 known entry and then begin parsing.


 What if the file has been truncated or removed by logrotate?

In this case it starts from the first line of course...

Please consider to enable EnableLockForUpdate feature.  From the README.Debian:
8
Also consider enabling lock files in /etc/awstats/awstats.conf with
EnableLockForUpdate=1 so that only one AWStats update process is
running at a time.  This will reduce system resources especially if
the AWStats update process takes longer than 10 minutes to complete.
This solution has some security drawbacks: lockfile with well-known
name and writable by www-data user.
-8

 I guess because I haven't written a proper config file yet? Anyway, it's
 still spamming my syslog *every 10 minutes*. This should at least be an
 option that's off by default.

It's a fresh install, right?

On Fri, Jul 23, 2010 at 9:39 PM, Jonas Smedegaard jo...@jones.dk wrote:
 I experienced cron spam too when trying to install awstats recently (and too
 busy at the time to investigate further - just cursed a bit and uninstalled
 awstats again).

 Possibly not a helpful comment - just want to hint that there might actually
 be an issue of cron spam in virgin installs of awstats currently.

I guess, we can disable cron jobs by default on a fresh install.  As
/etc/awstats/awstats.conf is not configured by default,

 I'm disagree with severity.  Looks like a very
 site-specific/workload-specific issue.  Your logrotate-based solution
 could be suggested as an option in README.Debian for specific setups.


 logrotate is part of the standard install for Debian webservers (at least
 apache2 and lighttpd). this is not site specific.

Yes, but your logrotate settings is very site specific and far, far
away from defaults...



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: [Pkg-awstats-devel] Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Ximin Luo

On 23/07/10 18:54, Sergey B Kirpichev wrote:

I guess because I haven't written a proper config file yet? Anyway, it's
still spamming my syslog *every 10 minutes*. This should at least be an
option that's off by default.


It's a fresh install, right?



yeah.


logrotate is part of the standard install for Debian webservers (at least
apache2 and lighttpd). this is not site specific.


Yes, but your logrotate settings is very site specific and far, far
away from defaults...


Where did I suggest that I edited my logrotate scripts? They are unchanged 
since being installed...


Or do you mean the solution I proposed? AFAICS Debian utility packages normally 
assume they're going to be used on/by other Debian packages, so it's fine to 
assume that awstats is being installed for the logs on some local Debian 
webserver package.


In fact, the solution I described in the cloned bug reports above, won't put 
any extra effort on awstats maintenance:


- awstats adds some update scripts into DIR. job done on the awstats side.
- default logrotate scripts of various webservers call DIR when rotating 
logs. (what I made those cloned reports for)
- if a site admin wants to use non-default log settings, then they'll need to 
edit their logrotate scripts anyhow.


X



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#590074: awstats: DO NOT use cron scripts to update stats database

2010-07-23 Thread Jonas Smedegaard

On Fri, Jul 23, 2010 at 09:54:23PM +0400, Sergey B Kirpichev wrote:

I guess because I haven't written a proper config file yet? Anyway, 
it's still spamming my syslog *every 10 minutes*. This should at 
least be an option that's off by default.


It's a fresh install, right?

On Fri, Jul 23, 2010 at 9:39 PM, Jonas Smedegaard jo...@jones.dk wrote:
I experienced cron spam too when trying to install awstats recently 
(and too busy at the time to investigate further - just cursed a bit 
and uninstalled awstats again).


Possibly not a helpful comment - just want to hint that there might 
actually be an issue of cron spam in virgin installs of awstats 
currently.


I guess, we can disable cron jobs by default on a fresh install.  As 
/etc/awstats/awstats.conf is not configured by default,


A virgin install must not cause cron spam.  If you implicitly 
acknowledge above that awstats currently does, then yes, we should 
disable it by default (or figure out something more clever).



I'm disagree with severity.  Looks like a very 
site-specific/workload-specific issue.  Your logrotate-based 
solution could be suggested as an option in README.Debian for 
specific setups.




logrotate is part of the standard install for Debian webservers (at 
least apache2 and lighttpd). this is not site specific.


Yes, but your logrotate settings is very site specific and far, far 
away from defaults...


I fail to understand: Did I miss something and we have already been 
provided the specific logrotate config of that host?


If you are simply guessing, I suggest you state that more clearly, and 
be kinder about alternative viewpoints here. :-)



 - Jonas

--
 * Jonas Smedegaard - idealist  Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private


signature.asc
Description: Digital signature