Re: [analog-help] server crashing down / big cache file
Hi Jeremy, > As I see it the problem is that your are trying to get Analog to process > a 385MB cache file with only 256MB of memory (although it should use > swap, but that will be really slow, etc.). There are two solutions: I made the experience, that this is really slow ;-) > 1] (As Michael Hill suggested) put more memory on the machine. Really. I > know your provider says they won't do that, but 256MB is kind of limited > for a web server these days and certainly for a stats servers, > especially as memory is so cheap. No chance. I think about changing the provider - but that's not important here. > 2] Reduce the amount of processing that Analog does in memory. The cache > file is essentially a serialized dump of the memory structure that > Analog works on. If it's 385MB on disk, then Analog is going to need > about 400MB of memory to work on it. Sure you can swap to disk, but > that's really inefficient. Using LOWMEM commands won't help because you > are already using more memory than you appear to have available. Short > of using LOWMEM 3 which removes the data. Here are some suggestions: I will try some of your suggestions - hope i will get along with the problem. thanks, really good mailing list. One simple question: how do you handle sites producing much logs? do you create several cache files, i.e. every week? thanks again Andi Leppert P.S.: Somehow I'm happy to have this problem - now I've got a better understanding of analog which is really a great program! _ Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Netsky.A-P! Kostenfrei fuer alle FreeMail Nutzer. http://f.web.de/?mc=021157 + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
As I see it the problem is that your are trying to get Analog to process a 385MB cache file with only 256MB of memory (although it should use swap, but that will be really slow, etc.). There are two solutions: 1] (As Michael Hill suggested) put more memory on the machine. Really. I know your provider says they won't do that, but 256MB is kind of limited for a web server these days and certainly for a stats servers, especially as memory is so cheap. 2] Reduce the amount of processing that Analog does in memory. The cache file is essentially a serialized dump of the memory structure that Analog works on. If it's 385MB on disk, then Analog is going to need about 400MB of memory to work on it. Sure you can swap to disk, but that's really inefficient. Using LOWMEM commands won't help because you are already using more memory than you appear to have available. Short of using LOWMEM 3 which removes the data. Here are some suggestions: * Don't run reports for the entire period or limit the cache files. If you are reporting for only a week at a time, then store only the week's worth of data in the cache file and not have to reload log files for old weeks. You will still not be able to do the running total, you might be able to write stats for the past quarter or year which may be good enough. You can always archive the *reports* for your customers to look at old stats to do trend analysis. Often, looking at archived reports for each month is more valuable than a cumulative report. * Limit the data in the reports using EXCLUDE or ALIAS commands. If you can identify requests or referrers that produce a *large* number of unique entries and can reduce this with an alias or remove it, if not important, that can help reduce the memory consumption. For example, if you have requests with a unique session code in each request, use a FILEALIAS to remove that code from the data. * With careful use of scripts, you could run Analog several times on the data, each time using a different LOWMEM3 command. Then by re-combining the output of the reports (being careful to pull the right reports from the right runs) you can work around the loss of data problem. This requires a very good understanding of the report-to-data correlation and will also take about six times as long to run as it previously has, but can reduce memory usage overall. -- Jeremy Wadsack MCS, LLC + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
Hi Adalbert, > actually I don´t understand why you are using cache files. As I understand > it, using caches is only useful when you can´t for whatever reason archive > your logfiles, and esp. gzip them so that their size is substantially > reduced. So, you mean i should work with the growing number of log-files which are gzipped for example by logrotate? thats an idea... > Strangely, if analog needs 2 minutes for one day, it still can take 8 > hours for a full month (on a fairly recent PC with 768 MB memory usage > will grow to to 1.2 GB while analyzing a full month with analog) - but as > I understand it the tables to store all those values are much bigger so it > is inevitable. in this paragraph, are you talking about using log-files without the cache file or using the cache file? Does i understand you correctly? it needs more disk space to store the values in the cache file (which "tables" do you mean?) thanks Andi Leppert Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Sober.A-F! Kostenfrei fuer FreeMail Nutzer. http://f.web.de/?mc=021158 + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
Hi, actually I don´t understand why you are using cache files. As I understand it, using caches is only useful when you can´t for whatever reason archive your logfiles, and esp. gzip them so that their size is substantially reduced. For the site where I run analog, I do experience longer runs if I decide to do a montly analysis, but for analysing just a day one or two minutes is all it takes. Strangely, if analog needs 2 minutes for one day, it still can take 8 hours for a full month (on a fairly recent PC with 768 MB memory usage will grow to to 1.2 GB while analyzing a full month with analog) - but as I understand it the tables to store all those values are much bigger so it is inevitable. Greetings, Adalbert -- Adalbert Duda [EMAIL PROTECTED] > Hi Duke, > > thanks again for your ideas. > >> I wonder if turning off unnecessary reports might be helpful. > > i will test this, but i think i have already turned off the unnecessary > reports. but i'll check this. > >> I wonder if using a value of 1 or 2 (instead of 3) might produce >> better results while still allowing your machine to finish the >> analysis. > > my understanding is that 3 is the strictest one of the three values. if > i.e. HOSTLOWMEM 3 is set, and analog finds a host in the log file, then > it justs throws away. > >> I wonder if using more LOWMEM commands (there are a >> total of 6) would further reduce file size and server load. > > i've tested all the six LOWMEM commands, the three i left my config > reduced the size of the cache file. > > >> Again, I don't use low memory commands, so I can provide >> no personal experience with them. I can only offer ideas based >> on my understanding of the documentation. > > thanks > > Andi > ___ > ... and the winner is... WEB.DE FreeMail! - Deutschlands beste E-Mail > ist zum 39. Mal Testsieger (PC Praxis 03/04) http://f.web.de/?mc=021191 > > + > | TO UNSUBSCRIBE from this list: > |http://lists.isite.net/listgate/analog-help/unsubscribe.html > | > | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | > Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | > List archives: http://www.analog.cx/docs/mailing.html#listarchives > + + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
Hi Duke, thanks again for your ideas. > I wonder if turning off unnecessary reports might be helpful. i will test this, but i think i have already turned off the unnecessary reports. but i'll check this. > I wonder if using a value of 1 or 2 (instead of 3) might produce > better results while still allowing your machine to finish the analysis. my understanding is that 3 is the strictest one of the three values. if i.e. HOSTLOWMEM 3 is set, and analog finds a host in the log file, then it justs throws away. > I wonder if using more LOWMEM commands (there are a > total of 6) would further reduce file size and server load. i've tested all the six LOWMEM commands, the three i left my config reduced the size of the cache file. > Again, I don't use low memory commands, so I can provide > no personal experience with them. I can only offer ideas based > on my understanding of the documentation. thanks Andi ___ ... and the winner is... WEB.DE FreeMail! - Deutschlands beste E-Mail ist zum 39. Mal Testsieger (PC Praxis 03/04) http://f.web.de/?mc=021191 + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
Andi, I know about the low memory commands from the Analog documentation, but I don't use them. Because the list is slow (are people on holiday?), I will try to help. After reading the low memory page, I have three ideas in no particular order. I wonder if turning off unnecessary reports might be helpful. I wonder if using a value of 1 or 2 (instead of 3) might produce better results while still allowing your machine to finish the analysis. I wonder if using more LOWMEM commands (there are a total of 6) would further reduce file size and server load. Again, I don't use low memory commands, so I can provide no personal experience with them. I can only offer ideas based on my understanding of the documentation. -- Duke Andreas Leppert wrote: [EMAIL PROTECTED] schrieb am 11.05.04 16:02:32: The Analog documentation mentions low memory at "http://www.analog.cx/docs/lowmem.html";. Hi Duke, thanks for your reply, I read the docu twice. I tried following: HOSTLOWMEM 3 FILELOWMEM 3 REFLOWMEM 3 LOGFILE access_log OUTPUT HTML OUTFILE bla/out.html CACHEFILE bla/cache CACHEOUTFILE bla/cachenew [...] my goal: read all the data in and create a new, smaller cache file. it worked perfectly, now my cache isn't 413 MB big, but 1.4MB. WOW! BUT: my client wants to have the following stats: browser / filetype / directory stats if i change my config to read only the cache file it returns with: turning off empty file report / turning off empty directory report! that's logical because i had FILELOWMEM 3 running... so i think there is no way out for me. my question is now: how do you handle such big sites with regard to the fact, that i want to have the stats mentioned above??? or did i understand something wrong? thanks in advance greetings andi leppert Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Sober.A-F! Kostenfrei fuer FreeMail Nutzer. http://f.web.de/?mc=021158 + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives + + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
[EMAIL PROTECTED] schrieb am 11.05.04 16:02:32: > > The Analog documentation mentions low memory > at "http://www.analog.cx/docs/lowmem.html";. Hi Duke, thanks for your reply, I read the docu twice. I tried following: HOSTLOWMEM 3 FILELOWMEM 3 REFLOWMEM 3 LOGFILE access_log OUTPUT HTML OUTFILE bla/out.html CACHEFILE bla/cache CACHEOUTFILE bla/cachenew [...] my goal: read all the data in and create a new, smaller cache file. it worked perfectly, now my cache isn't 413 MB big, but 1.4MB. WOW! BUT: my client wants to have the following stats: browser / filetype / directory stats if i change my config to read only the cache file it returns with: turning off empty file report / turning off empty directory report! that's logical because i had FILELOWMEM 3 running... so i think there is no way out for me. my question is now: how do you handle such big sites with regard to the fact, that i want to have the stats mentioned above??? or did i understand something wrong? thanks in advance greetings andi leppert Der WEB.DE Virenschutz schuetzt Ihr Postfach vor dem Wurm Sober.A-F! Kostenfrei fuer FreeMail Nutzer. http://f.web.de/?mc=021158 + | TO UNSUBSCRIBE from this list: |http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +
Re: [analog-help] server crashing down / big cache file
The Analog documentation mentions low memory at "http://www.analog.cx/docs/lowmem.html";. HTH, -- Duke Andreas Leppert wrote: hello analog list, sorry for posting in german, i'm posting again in german. I've a webserver with several domains on it, everything work pretty fine, but the major site on the server which produces 30GB traffic / month doesn't get along with analog. the nightly cronjobs run a script, which runs analog commands. here my analog.conf: # Configuration file for analog 4.03 # Extra options used for generating the log data for OpCenterWeb CACHEOUTFILE none DNS NONE #WARNINGS OFF #APACHELOGFORMAT ("%h %l %u %t \"%r\" %>s %b \"%{User-Agent}i\"") #LOGFORMAT (%S %l %j [%d/%M/%Y:%h:%n:%j] \"%j%w%r%wHTTP%j\" %c %b \"%B\") LOGFORMAT COMBINED LOGFILE none OUTPUT COMPUTER OUTFILE - GENERAL ON MONTHLY ON MONTHCOLS RrPp WEEKLY OFF FULLDAILY OFF DAILY ON DAYCOLS RrPp FULLHOURLY OFF HOURLY ON HOURCOLS RrPp QUARTER OFF FIVE OFF HOST ON HOSTCOLS RrPp HOSTFLOOR 1:r ORGANISATION ON ORGCOLS RrPp ORGFLOOR 1:r DOMAIN ON DOMCOLS RrPp DOMSORTBY REQUESTS DOMFLOOR 1:r SUBDOMFLOOR 1:r SUBDOMSORTBY REQUESTS REQUEST OFF DIRECTORY ON DIRCOLS Rrb FILETYPE ON TYPECOLS Rrb SIZE OFF PROCTIME OFF REDIR OFF FAILURE ON FAILCOLS Rr REFERER OFF REFSITE OFF SEARCHQUERY OFF SEARCHWORD OFF REDIRREF OFF FAILREF OFF FULLBROWSER OFF BROWSER ON BROWFLOOR 1:r BROWCOLS Rr BROWSORTBY REQUESTS SUBBROWSORTBY REQUESTS SUBBROWFLOOR 1:r OSREP OFF VHOST OFF USER OFF FAILUSER OFF STATUS OFF HOSTLOWMEM 3 the script which calls analog: #!/usr/bin/perl -w use integer; my($oneday) = 86400; my($oneweek) = 604800; sub analyze { # print STDERR "$_[0]\n"; if ($ENV{'OSTYPE'} =~ /^solaris/i) { system ("/usr/local/analog4.03/analog $_[0]"); } else { system ("/usr/bin/analog $_[0]"); } } #Process the command line arguments if ($#ARGV != 1) { die "The number of arguments is wrong\nUsage updatestats statsdir logfile\n"; } my($statdir,$logfile) = @ARGV; my($analogconffile) = "$statdir/analog.cfg"; my($time) = time; my($thisweekstat) = "$statdir/week".&getweeksbefore(0,$time); my(@stattodelete) = (); #we sometimes do garbage collection my($timeoption,$logoption); my($weeklyoption) = "+g$analogconffile +C\"MONTHLY OFF\""; my($cachefile) = "$statdir/cache"; my($runningoption) = "+g$analogconffile +C\"CACHEFILE $cachefile\""; unless (stat ("$thisweekstat")) { # build the statistics for last week one more time my ($lastweekstat) = "$statdir/week".&getweeksbefore(1,$time); @stattodelete = ("$statdir/week".&getweeksbefore(4,$time), "$statdir/week".&getweeksbefore(5,$time)); #just playing it safe, normally the first should be enough $timeoption = "+F".&firstdayofweek(1,$time)." +T".&lastdayofweek(1,$time); $logoption = "+C\"LOGFILE $logfile\" +C\"LOGFILE $logfile.1\" +C\"LOGFILE $logfile.2\""; #normally the last two should be enough analyze("$weeklyoption $timeoption $logoption >$lastweekstat"); if ( -e "$lastweekstat" ) { chmod(0640,"$lastweekstat") ; } #Get rid of "false ampty reports" my ($hastogo)=""; if (open (LASTWEEK,$lastweekstat)) { my ($line); while ($line=) { next if ($line !~ /^x\sSR\s(\d+)/); last if ($1 != 0); last if (stat($logfile.1)); $hastogo=1; last; } close(LASTWEEK); if ($hastogo) {unlink($lastweekstat)} } } #build statistics for this week $timeoption = "+F".&firstdayofweek(0,$time)." +T".&lastdayofweek(0,$time); $logoption = "+C\"LOGFILE $logfile\" +C\"LOGFILE $logfile.1\""; analyze("$weeklyoption $timeoption $logoption >$thisweekstat"); if ( -e "$thisweekstat" ) { chmod(0640,"$thisweekstat"); } if (@stattodelete) { map {unlink($_)} @stattodelete; } #Now we deal with the running total #Update the cache if the logs have been rotated since last call my (@statinfo); if (@statinfo = stat ("$logfile.1")) { my ($logmtime,$cachemtime); $logmtime = $statinfo[9]; if(@statinfo = stat ("$cachefile")) { $cachemtime = $statinfo[9]; } else { $cachemtime = 0; } if ($cachemtime != $logmtime) { #build new cache analyze("$runningoption +C\"LOGFILE $logfile.1\" +C\"CACHEOUTFILE $cachefile.new\" +C\"OUTPUT NONE\""); utime($logmtime,$logmtime,"$cachefile.new"); rename("$cachefile.new","$cachefile"); if ( -e "$cachefile" ) { chmod(0640,"$cachefile"); } } } analyze("$runningoption +C\"LOGFILE $logfile\" >$statdir/running"); if ( -e "$statdir/running" ) { chmod(0640,"$statdir/running") ; } #returns the number of the week of the year we were in x weeks before $time sub getweeksbefore { my ($diff,$time) = @_; $time -= $oneweek*$diff; # $diff weeks ago my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime($time); if ($wday > $yday) { #this week started last year, so we count it there ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime($time-$oneweek); $yday+=7; } return 1+($yday-$wday)/7; #1..53 } sub firstdayofweek { my ($diff,$time) = @_; $time -= $oneweek*$diff; # $diff weeks ago my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)= localtime($time); $time -= $oneday*$wday; # last Sunday ($sec,$min,$hour,$mday,$mon,$year,$wda