Andrew Houghton ([EMAIL PROTECTED]; Tuesday, February 04, 2003 12:59 PM):
> I have ~ 100MB of daily logs that I need to process. I need to provide
> historical information in the same file as I provide recent daily
> information, but I want to throw away the fine-grained detail as it
> becomes older.
> Ideally, I'd have 15 days worth of fine-grained log analysis, 30 days of
> of medium-grained analysis (for days -16 -> -45), followed by all
> historical data at a low-grained level.
> I *think* the following will work, but I wanted to run it by this group
> to get some feedback before I start writing scripts.
> 1. process nightly logs into distinct nightly cache files with
> fine-grained info.
> 2. keep nightly cache files for 45 days
> 3. every night, create a new history cache file out of previous
> history cache + oldest nightly (low-grained info).
> 4. delete oldest nightly.
> 5. every night, create a brand new cache file of medium-grained
> info out of the oldest 30 nightly cache files
> 6. every night, create a brand new cache file of fine-grained info
> out of the newest 15 nightly cache files
> 7. every night, create a single fine-grained run of the history
> cache file, the medium-grained cache file, and the fine-grained
> cache file.
> This seems like it should work, and it seems like it should provide the
> information we need while providing savings of processing time and
> memory usage.
> Is there anything in here that doesn't make sense? Anything that I need
> to consider? Will the final nightly run suffer in any way from
> including sparse historical data?
First of all, the fine-grained should be 14 days (you just deleted the
45th day, so there are only 44 days left).
Second, in order to avoid common cache-file pitfalls, heed these
points:
* As you specify make sure you delete the 45th day before building the
medium-grain otherwise you will have double-counting for that day.
* Make sure you never include daily, fine-grained cache files in your
reporting step or they will interfere with the other cache files.
* You don't really need to create the new fine-grained cache file of
the most recent 14 days -- you can just include the daily cache files
which would save a step of processing.
Now, I'm not sure how you are defining 'low-grained' and
'medium-grained' and whether that will reduce the memory used by
Analog. The cache files contain a complete record of each internal
table Analog uses to process. In order to get accurate results you
need to store every records there was. If you apply any filters or
aliases to some cache files and not others then you aren't comparing
the same thing.
For example, suppose you have these records in your daily,
fine-grained cache of 16 days ago:
REQUEST: /index.html, 234R
REQUEST: /index.html?show=login, 45R
And these records in your daily, fine-grained cache of yesterday:
REQUEST: /index.html, 154R
REQUEST: /index.html?show=login, 12R
Now you reduce the granularity of the older cache file by removing
arguments, so that your medium grained cache file contains:
REQUEST: /index.html, 279R
When you run the reports you'll see this, which isn't necessarily
accurate:
Reqs | File
----------------------------------
433 | /index.html
12 | /index.html?show=login
This happens because while Analog will collect all the information for
particular days out of the cache files, it aggregates it across all
periods reported. So I'm not sure that what your reports tell you will
really be what you want them to show.
--
Jeremy Wadsack
Wadsack-Allen Digital Group
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
| Digest version: http://lists.isite.net/listgate/analog-help-digest/
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------