Comments embedded in message.

-- Duke

Danny Freedman wrote:

:-----Original Message-----
:From: [EMAIL PROTECTED] [mailto:owner-analog-
:[EMAIL PROTECTED] On Behalf Of Jeremy Wadsack
:Sent: 26 February 2004 16:45
:To: [EMAIL PROTECTED]
:Subject: Re: [analog-help] Can analog be used to filter log files
:
:
:A couple comments pertaining to your cases below. If you are trying to
:filter log files by known information, grep or even a Perl script can
:actually be faster than Analog. Especially when splitting server logs
:into separate logs for virtual hosts or sites.
[DF] analog gets through over 30 Million data lines in less than 90
minutes on a Sun box - and that includes an extensive list of excludes
and a number of other processes. (If you discount the time it takes to
produce the report from the cache files this comes down to about 75
minutes).

Hard to guess the time involved for someone else's box,
but this would probably take 3 minutes with grep or perl.
-- DH

:
:Analog processes log file lines in a stream, sequentially. As each
:line is encountered it's checked against the data filters to determine
:if it will be included and then the data is added to the hashes that
:hold the results for the reports. (This is simplified of course.) So
:there is no place where Analog really could write out a log file of
:the lines that are just in the Request Report. In fact the cache files
:are fairly close to a serialized version of a memory dump just before
:reports are written.
[DF] this gets to the heart of my question. Could the log line not
therefore be written out at the point at which it is determined it will
be included?


This seems a little distant from Analog's stated purpose.
-- DH

:
:And on the auditing point, perhaps I don't fully understand it, but if
:you are using a third party auditor to verify your results and you
:send them data that exactly corresponds to the results, wouldn't that
:defeat the purpose of the auditing?
[DF] No, the idea is that they check the lines we think are valid. No
point in sending them stuff that we have already determined will be
rejected.


Unless they charge extra, why not send them everything?
-- DH


Regards, DF

:
:--
:
:Jeremy Wadsack
:Wadsack-Allen Digital Group
:
:
:Danny Freedman <[EMAIL PROTECTED]> (Thursday, February 26, 2004
3:41
:AM):
:
:> Just me then?
:
:> There are lots of uses I could put this to, but maybe that is just
:> because we have such large amounts of data to deal with that the
ability
:> to chop our logs up into manageable chunks very quickly would be
:> invaluable.
:
:> Some examples:
:> 1) My biggest data job is auditing the site. I use analog to give me
a
:> top-line daily page impression count for a month's data.
:> This requires lots of exclusion rules which can get quite complex.
:> To verify the claim I send the auditors sample log data for a number
of
:> random days which they then use to try and match the figures. This
very
:> painful and drawn out task would be a whole lot easier if I could
just
:> send them the log data for the records that have been counted.
:
:> 2) Sometimes when trying to design a particular analog report it is
not
:> always clear if parameters have done what I expect them to do. Having
:> the accepted log lines available would make it easy to check this.
:
:> 3) We already do lots of pre-filtering of logs but the scripts we use
:> are less configurable and probably slower than analog.
:
:> 4) Cache files are not a substitute for log file format. Cache files
may
:> not include all the info you want for a second run on the data. You
:> can't eyeball them to see what they contain.
:
:> 5) We have lots of sub-sites and don't have time/resources to do all
the
:> analysis that individual producers want. With this we could produce a
:> separate data set that we could give them to do their own work on.
:
:> Regards,
:> Danny Freedman
:
:
:
:> :-----Original Message-----
:> :From: [EMAIL PROTECTED] [mailto:owner-analog-
:> :[EMAIL PROTECTED] On Behalf Of S Collis
:> :Sent: 26 February 2004 05:04
:> :To: [EMAIL PROTECTED]
:> :Subject: Re: [analog-help] Can analog be used to filter log files
:> :
:> :Presumably to keep a cache file?  But isn't that what analog cache
:> files
:> :are for?
:> :
:> :I'm confused now...
:> :
:> :*********** REPLY SEPARATOR  ***********
:> :
:> :On 25/02/2004 at 14:32 Duke Hillard wrote:
:> :
::>>What would be the purpose of such a file?
::>>
::>>-- Duke
::>>
::>>
::>>Danny Freedman wrote:
::>>
::>>> Analog is great but it would be fantastic if as well as producing
a
::>>> report it could write out each log record that it accepts into a
new
::>>> log file.
::>>>
::>>> Such logfile output could be linked to a particular report. For
:> example:
::>>>
::>>> REQLOGOUT validrequests.log
::>>>
::>>> would cause all the log lines that had been counted in the REQUEST
::>>> Report to be written out to the "validrequests.log" file.
::>>>
::>>> Would this kind of function be difficult to add seeing as I assume
::>>> each line is being read and processed anyway?
::>>>
::>>> Regards,
::>>>
::>>> Danny Freedman
::>>>
::>>>
::>>> BBCi at http://www.bbc.co.uk/
::>>>




+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.isite.net/listgate/analog-help/unsubscribe.html | | Digest version: http://lists.isite.net/listgate/analog-help-digest/ | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

Reply via email to