Re: [analog-help] server crashing down / big cache file

2004-05-14 Thread analog-help
Hi Jeremy,

> As I see it the problem is that your are trying to get Analog to process 
> a 385MB cache file with only 256MB of memory (although it should use 
> swap, but that will be really slow, etc.). There are two solutions:

I made the experience, that this is really slow ;-)

> 1] (As Michael Hill suggested) put more memory on the machine. Really. I 
> know your provider says they won't do that, but 256MB is kind of limited 
> for a web server these days and certainly for a stats servers, 
> especially as memory is so cheap.

No chance.  I think about changing the provider - but that's not important here.

> 2] Reduce the amount of processing that Analog does in memory. The cache 
> file is essentially a serialized dump of the memory structure that 
> Analog works on. If it's 385MB on disk, then Analog is going to need 
> about 400MB of memory to work on it. Sure you can swap to disk, but 
> that's really inefficient. Using LOWMEM commands won't help because you 
> are already using more memory than you appear to have available. Short 
> of using LOWMEM 3 which removes the data. Here are some suggestions:

I will try some of your suggestions - hope i will get along with the problem. thanks, 
really good mailing list. 

One simple question: how do you handle sites producing much logs? do you create 
several cache files, i.e. every week? 

thanks again
Andi Leppert

P.S.: Somehow I'm happy to have this problem - now I've got a better understanding of 
analog which is really a great program!
_
Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Netsky.A-P!
Kostenfrei fuer alle FreeMail Nutzer. http://f.web.de/?mc=021157

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-13 Thread analog-help
As I see it the problem is that your are trying to get Analog to process 
a 385MB cache file with only 256MB of memory (although it should use 
swap, but that will be really slow, etc.). There are two solutions:

1] (As Michael Hill suggested) put more memory on the machine. Really. I 
know your provider says they won't do that, but 256MB is kind of limited 
for a web server these days and certainly for a stats servers, 
especially as memory is so cheap.

2] Reduce the amount of processing that Analog does in memory. The cache 
file is essentially a serialized dump of the memory structure that 
Analog works on. If it's 385MB on disk, then Analog is going to need 
about 400MB of memory to work on it. Sure you can swap to disk, but 
that's really inefficient. Using LOWMEM commands won't help because you 
are already using more memory than you appear to have available. Short 
of using LOWMEM 3 which removes the data. Here are some suggestions:

   * Don't run reports for the entire period or limit the cache files.
 If you are reporting for only a week at a time, then store only
 the week's worth of data in the cache file and not have to reload
 log files for old weeks. You will still not be able to do the
 running total, you might be able to write stats for the past
 quarter or year which may be good enough. You can always archive
 the *reports* for your customers to look at old stats to do trend
 analysis. Often, looking at archived reports for each month is
 more valuable than a cumulative report.
   * Limit the data in the reports using EXCLUDE or ALIAS commands. If
 you can identify requests or referrers that produce a *large*
 number of unique entries and can reduce this with an alias or
 remove it, if not important, that can help reduce the memory
 consumption. For example, if you have requests with a unique
 session code in each request, use a FILEALIAS to remove that code
 from the data.
   * With careful use of scripts, you could run Analog several times on
 the data, each time using a different LOWMEM3 command. Then by
 re-combining the output of the reports (being careful to pull the
 right reports from the right runs) you can work around the loss of
 data problem. This requires a very good understanding of the
 report-to-data correlation and will also take about six times as
 long to run as it previously has, but can reduce memory usage overall.
--

Jeremy Wadsack
MCS, LLC
+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-13 Thread analog-help
Hi Adalbert,

> actually I don´t understand why you are using cache files. As I understand
> it, using caches is only useful when you can´t for whatever reason archive
> your logfiles, and esp. gzip them so that their size is substantially
> reduced.

So, you mean i should work with the growing number of log-files which are gzipped for 
example by logrotate? thats an idea... 

> Strangely, if analog needs 2 minutes for one day, it still can take 8
> hours for a full month (on a fairly recent PC with 768 MB memory usage
> will grow to to 1.2 GB while analyzing a full month with analog) - but as
> I understand it the tables to store all those values are much bigger so it
> is inevitable.
in this paragraph, are you talking about using log-files without the cache file or 
using the cache file? Does i understand you correctly? it needs more disk space to 
store the values in the cache file (which "tables" do you mean?)

thanks
Andi Leppert

Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Sober.A-F!
Kostenfrei fuer FreeMail Nutzer. http://f.web.de/?mc=021158

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-13 Thread analog-help
Hi,

actually I don´t understand why you are using cache files. As I understand
it, using caches is only useful when you can´t for whatever reason archive
your logfiles, and esp. gzip them so that their size is substantially
reduced.

For the site where I run analog, I do experience longer runs if I decide
to do a montly analysis, but for analysing just a day one or two minutes
is all it takes.

Strangely, if analog needs 2 minutes for one day, it still can take 8
hours for a full month (on a fairly recent PC with 768 MB memory usage
will grow to to 1.2 GB while analyzing a full month with analog) - but as
I understand it the tables to store all those values are much bigger so it
is inevitable.

Greetings,

Adalbert

--
Adalbert Duda [EMAIL PROTECTED]

> Hi Duke,
>
> thanks again for your ideas.
>
>> I wonder if turning off unnecessary reports might be helpful.
>
> i will test this, but i think i have already turned off the unnecessary
> reports. but i'll check this.
>
>> I wonder if using a value of 1 or 2 (instead of 3) might produce
>> better results while still allowing your machine to finish the
>> analysis.
>
> my understanding is that 3 is the strictest one of the three values. if
> i.e. HOSTLOWMEM 3 is set, and analog finds a host in the log file, then
> it justs throws away.
>
>> I wonder if using more LOWMEM commands (there are a
>> total of 6) would further reduce file size and server load.
>
> i've tested all the six LOWMEM commands, the three i left my config
> reduced the size of the cache file.
>
>
>> Again, I don't use low memory commands, so I can provide
>> no personal experience with them.  I can only offer ideas based
>> on my understanding of the documentation.
>
> thanks
>
> Andi
> ___
> ... and the winner is... WEB.DE FreeMail! - Deutschlands beste E-Mail
> ist zum 39. Mal Testsieger (PC Praxis 03/04) http://f.web.de/?mc=021191
>
> +
> |  TO UNSUBSCRIBE from this list:
> |http://lists.isite.net/listgate/analog-help/unsubscribe.html
> |
> |  Digest version: http://lists.isite.net/listgate/analog-help-digest/ |
>  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general |
> List archives:  http://www.analog.cx/docs/mailing.html#listarchives
> +



+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-12 Thread analog-help
Hi Duke, 

thanks again for your ideas. 

> I wonder if turning off unnecessary reports might be helpful.

i will test this, but i think i have already turned off the unnecessary reports. but 
i'll check this.
 
> I wonder if using a value of 1 or 2 (instead of 3) might produce
> better results while still allowing your machine to finish the analysis.

my understanding is that 3 is the strictest one of the three values. if i.e. 
HOSTLOWMEM 3 is set, and analog finds a host in the log file, then it justs throws 
away. 
 
> I wonder if using more LOWMEM commands (there are a
> total of 6) would further reduce file size and server load.

i've tested all the six LOWMEM commands, the three i left my config reduced the size 
of the cache file.

 
> Again, I don't use low memory commands, so I can provide
> no personal experience with them.  I can only offer ideas based
> on my understanding of the documentation.

thanks
 
Andi
___
... and the winner is... WEB.DE FreeMail! - Deutschlands beste E-Mail
ist zum 39. Mal Testsieger (PC Praxis 03/04) http://f.web.de/?mc=021191

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-12 Thread analog-help
Andi,

   I know about the low memory commands from the Analog
documentation, but I don't use them.  Because the list is slow
(are people on holiday?), I will try to help.  After reading the
low memory page, I have three ideas in no particular order.
   I wonder if turning off unnecessary reports might be helpful.

   I wonder if using a value of 1 or 2 (instead of 3) might produce
better results while still allowing your machine to finish the analysis.
   I wonder if using more LOWMEM commands (there are a
total of 6) would further reduce file size and server load.
   Again, I don't use low memory commands, so I can provide
no personal experience with them.  I can only offer ideas based
on my understanding of the documentation.
-- Duke

Andreas Leppert wrote:

[EMAIL PROTECTED] schrieb am 11.05.04 16:02:32:
 

The Analog documentation mentions low memory
at "http://www.analog.cx/docs/lowmem.html";.
   

Hi Duke,

thanks for your reply, I read the docu twice. 

I tried following: 

HOSTLOWMEM 3
FILELOWMEM 3
REFLOWMEM 3
LOGFILE access_log
OUTPUT HTML
OUTFILE bla/out.html
CACHEFILE bla/cache
CACHEOUTFILE bla/cachenew
[...]
my goal: read all the data in and create a new, smaller cache file. it worked perfectly, now my cache isn't 413 MB big, but 1.4MB. WOW!

BUT:
my client wants to have the following stats:
browser / filetype / directory stats
if i change my config to read only the cache file it returns with: turning off empty file report / turning off empty directory report!

that's logical because i had FILELOWMEM 3 running... so i think there is no way out for me.

my question is now: how do you handle such big sites with regard to the fact, that i want to have the stats mentioned above??? or did i understand something wrong?

thanks in advance
greetings
andi leppert

Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Sober.A-F!
Kostenfrei fuer FreeMail Nutzer. http://f.web.de/?mc=021158
+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+
 

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-12 Thread analog-help
[EMAIL PROTECTED] schrieb am 11.05.04 16:02:32:
> 
> The Analog documentation mentions low memory
> at "http://www.analog.cx/docs/lowmem.html";.

Hi Duke,

thanks for your reply, I read the docu twice. 

I tried following: 

HOSTLOWMEM 3
FILELOWMEM 3
REFLOWMEM 3
LOGFILE access_log
OUTPUT HTML
OUTFILE bla/out.html
CACHEFILE bla/cache
CACHEOUTFILE bla/cachenew
[...]

my goal: read all the data in and create a new, smaller cache file. it worked 
perfectly, now my cache isn't 413 MB big, but 1.4MB. WOW!

BUT:
my client wants to have the following stats:
browser / filetype / directory stats

if i change my config to read only the cache file it returns with: turning off empty 
file report / turning off empty directory report!

that's logical because i had FILELOWMEM 3 running... so i think there is no way out 
for me.

my question is now: how do you handle such big sites with regard to the fact, that i 
want to have the stats mentioned above??? or did i understand something wrong?

thanks in advance
greetings
andi leppert

Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Sober.A-F!
Kostenfrei fuer FreeMail Nutzer. http://f.web.de/?mc=021158

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] server crashing down / big cache file

2004-05-11 Thread analog-help
The Analog documentation mentions low memory
at "http://www.analog.cx/docs/lowmem.html";.
HTH,

-- Duke

Andreas Leppert wrote:

hello analog list,

sorry for posting in german, i'm posting again in german. 

I've a webserver with several domains on it, everything work pretty fine, but the major site on the server which produces 30GB traffic / month doesn't get along with analog. the nightly cronjobs run a script, which runs analog commands.

here my analog.conf:
# Configuration file for analog 4.03
# Extra options used for generating the log data for OpCenterWeb
CACHEOUTFILE none
DNS NONE
#WARNINGS OFF
#APACHELOGFORMAT ("%h %l %u %t \"%r\" %>s %b \"%{User-Agent}i\"")
#LOGFORMAT (%S %l %j [%d/%M/%Y:%h:%n:%j] \"%j%w%r%wHTTP%j\" %c %b \"%B\")
LOGFORMAT COMBINED
LOGFILE none
OUTPUT COMPUTER
OUTFILE -
GENERAL ON
MONTHLY ON
MONTHCOLS RrPp
WEEKLY OFF
FULLDAILY OFF
DAILY ON
DAYCOLS RrPp
FULLHOURLY OFF
HOURLY ON
HOURCOLS RrPp
QUARTER OFF
FIVE OFF
HOST ON
HOSTCOLS RrPp
HOSTFLOOR 1:r
ORGANISATION ON
ORGCOLS RrPp
ORGFLOOR 1:r
DOMAIN ON
DOMCOLS RrPp
DOMSORTBY REQUESTS
DOMFLOOR 1:r
SUBDOMFLOOR 1:r
SUBDOMSORTBY REQUESTS
REQUEST OFF
DIRECTORY ON
DIRCOLS Rrb
FILETYPE ON
TYPECOLS Rrb
SIZE OFF
PROCTIME OFF
REDIR OFF
FAILURE ON
FAILCOLS Rr
REFERER OFF
REFSITE OFF
SEARCHQUERY OFF
SEARCHWORD OFF
REDIRREF OFF
FAILREF OFF
FULLBROWSER OFF
BROWSER ON
BROWFLOOR 1:r
BROWCOLS Rr
BROWSORTBY REQUESTS
SUBBROWSORTBY REQUESTS
SUBBROWFLOOR 1:r
OSREP OFF
VHOST OFF
USER OFF
FAILUSER OFF
STATUS OFF
HOSTLOWMEM 3
the script which calls analog:

#!/usr/bin/perl -w
use integer;
my($oneday) = 86400;
my($oneweek) = 604800;
sub analyze {
# print STDERR "$_[0]\n";
if ($ENV{'OSTYPE'} =~ /^solaris/i) {
system ("/usr/local/analog4.03/analog $_[0]");
} else {
system ("/usr/bin/analog $_[0]");
}
}
#Process the command line arguments
if ($#ARGV != 1) {
die "The number of arguments is wrong\nUsage updatestats statsdir logfile\n";
}
my($statdir,$logfile) = @ARGV;
my($analogconffile) = "$statdir/analog.cfg";
my($time) = time;
my($thisweekstat) = "$statdir/week".&getweeksbefore(0,$time);
my(@stattodelete) = (); #we sometimes do garbage collection
my($timeoption,$logoption);
my($weeklyoption) = "+g$analogconffile +C\"MONTHLY OFF\"";
my($cachefile) = "$statdir/cache";
my($runningoption) = "+g$analogconffile +C\"CACHEFILE $cachefile\"";
unless (stat ("$thisweekstat")) {
# build the statistics for last week one more time
my ($lastweekstat) = "$statdir/week".&getweeksbefore(1,$time);
@stattodelete = ("$statdir/week".&getweeksbefore(4,$time),
"$statdir/week".&getweeksbefore(5,$time));
#just playing it safe, normally the first should be enough
$timeoption = "+F".&firstdayofweek(1,$time)." +T".&lastdayofweek(1,$time);
$logoption = "+C\"LOGFILE $logfile\" +C\"LOGFILE $logfile.1\" +C\"LOGFILE 
$logfile.2\"";
#normally the last two should be enough
analyze("$weeklyoption $timeoption $logoption >$lastweekstat");
if ( -e "$lastweekstat" ) {
chmod(0640,"$lastweekstat") ;
}
#Get rid of "false ampty reports"
my ($hastogo)="";
if (open (LASTWEEK,$lastweekstat)) {
my ($line);
while ($line=) {
next if ($line !~ /^x\sSR\s(\d+)/);
last if ($1 != 0);
last if (stat($logfile.1));
$hastogo=1;
last;
}
close(LASTWEEK);
if ($hastogo) {unlink($lastweekstat)}
}
}
#build statistics for this week
$timeoption = "+F".&firstdayofweek(0,$time)." +T".&lastdayofweek(0,$time);
$logoption = "+C\"LOGFILE $logfile\" +C\"LOGFILE $logfile.1\"";
analyze("$weeklyoption $timeoption $logoption >$thisweekstat");
if ( -e "$thisweekstat" ) {
chmod(0640,"$thisweekstat");
}
if (@stattodelete) {
map {unlink($_)} @stattodelete;
}
#Now we deal with the running total
#Update the cache if the logs have been rotated since last call
my (@statinfo);
if (@statinfo = stat ("$logfile.1")) {
my ($logmtime,$cachemtime);
$logmtime = $statinfo[9];
if(@statinfo = stat ("$cachefile")) {
$cachemtime = $statinfo[9];
} else {
$cachemtime = 0;
}
if ($cachemtime != $logmtime) {
#build new cache
analyze("$runningoption +C\"LOGFILE $logfile.1\" +C\"CACHEOUTFILE $cachefile.new\" +C\"OUTPUT 
NONE\"");
utime($logmtime,$logmtime,"$cachefile.new");
rename("$cachefile.new","$cachefile");
if ( -e "$cachefile" ) {
chmod(0640,"$cachefile");
}
}
}
analyze("$runningoption +C\"LOGFILE $logfile\" >$statdir/running");
if ( -e "$statdir/running" ) {
chmod(0640,"$statdir/running") ;
}
#returns the number of the week of the year we were in x weeks before $time
sub getweeksbefore {
my ($diff,$time) = @_;
$time -= $oneweek*$diff; # $diff weeks ago
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime($time);
if ($wday > $yday) { #this week started last year, so we count it there
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime($time-$oneweek);
$yday+=7;
}
return 1+($yday-$wday)/7; #1..53
}
sub firstdayofweek {
my ($diff,$time) = @_;
$time -= $oneweek*$diff; # $diff weeks ago
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime($time);
$time -= $oneday*$wday; # last Sunday
($sec,$min,$hour,$mday,$mon,$year,$wda