Re: Log analysis server suggestions?

2006-02-20 Thread Ashley Moran
On Thursday 16 February 2006 15:07, Nathan Vidican wrote:
I would advise against trying to log everything into SQL records, aside
 from the performance hit on translating log/write outputs to SQL
 inserts/queries then having the SQL server write to disk anyway, it just
 complicates things uneccessarily.

You are probably right.  I was thinking that it would be easier to search 
through in a database, but then, most of the issues we are interested in (eg 
disk failure) we want to know about *now*, rather than the sort of thing that 
are revealed by historical analysis.

 My advice would be to take a step back and look at what's important to you.
 I find it's best to
 work with a mixture of things and hack your own scripts to fill in the
 gaps.

Having looked at some logs, most of the stuff we are interested in probably is 
specific to our setup.  Log formats are so loose I doubt any off-the-shelf 
log analysis tool would be much good unless it was 10x more complex than most 
of the software we want to log anyway.

It's surprised me how much time and effort it takes to turn logs into useful 
data.  And I wonder how Windows admins get by at all?

Thanks for the advice
Ashley
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Log analysis server suggestions? [long]

2006-02-20 Thread Ashley Moran
On Thursday 16 February 2006 15:30, Chuck Swiger wrote:
 I'm not sure who the original poster was, but whoever is interested in this
 topic might benefit by reading a thread from the firewall-wizards mailing
 list:

snip

Cheers that was very useful- I've put it into our company Wiki so it can be 
ignored by everyone :)

I like the 3-stage processing:
 Simply design your analysis as an always 3-stage process consisting of:
 - weeding out and counting instances of uninteresting events
 - selecting, parsing sub-fields of, and processing interesting events
 - retaining events that fell through the first two steps as unusual

That solves the problem of missing logs that you didn't anticipate, although 
it adds a lot to the initial server configuration.

Ashley
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Log analysis server suggestions?

2006-02-16 Thread Ashley Moran
Until recently I had a server running syslog-ng set to archive all logs into 
server/year/month/day/ directories.  Now the server is running in amd64, 
we've lost our hi-res scrolling display so I want to look at a better log 
watching system.

I've read about logging to a database.  I quite like the idea of storing our 
logs in PostgreSQL (I don't like MySQL and don't want to get involved in 
administering a second database).  I know I can log to a PG database quite 
easily, but I don't know how I can get the data back out without writing 
manual queries.

Here is what I need:

- Logs stored for the last 6 months or so, and easily searchable
- Live log watching
- Log analysis

I might try swatch for the live log watching as this is not affected by the 
choice of log storage and seems the best tool for the job.

As for searching / analysis, I've seen php-syslog-ng 
( http://www.vermeer.org/projects/php-syslog-ng ), which looks very basic, 
and phpLogCon ( http://www.phplogcon.com/ ), which does not support PG 
anyway.  Is there anything better GUI-wise?

Maybe I am best keeping the logs in text files for now, and spending more time 
on swatch.

Any thoughts?

Cheers
Ashley
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Log analysis server suggestions?

2006-02-16 Thread Nathan Vidican

Ashley Moran wrote:
Until recently I had a server running syslog-ng set to archive all logs into 
server/year/month/day/ directories.  Now the server is running in amd64, 
we've lost our hi-res scrolling display so I want to look at a better log 
watching system.


I've read about logging to a database.  I quite like the idea of storing our 
logs in PostgreSQL (I don't like MySQL and don't want to get involved in 
administering a second database).  I know I can log to a PG database quite 
easily, but I don't know how I can get the data back out without writing 
manual queries.


Here is what I need:

- Logs stored for the last 6 months or so, and easily searchable
- Live log watching
- Log analysis

I might try swatch for the live log watching as this is not affected by the 
choice of log storage and seems the best tool for the job.


As for searching / analysis, I've seen php-syslog-ng 
( http://www.vermeer.org/projects/php-syslog-ng ), which looks very basic, 
and phpLogCon ( http://www.phplogcon.com/ ), which does not support PG 
anyway.  Is there anything better GUI-wise?


Maybe I am best keeping the logs in text files for now, and spending more time 
on swatch.


Any thoughts?

Cheers
Ashley
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


In my experience, logfiles are best NOT in SQL, flat-files are easy to deal 
with, and through a few simple Perl scripts you could accomplish all you need 
to. You can run a tail -f and dump output to stdout, or even pipe it to a socket 
and monitor remotely. Also, various programs have great open-source analysers, 
for specific logs, (ie Apache, sendmail, etc).


I would advise against trying to log everything into SQL records, aside from the 
performance hit on translating log/write outputs to SQL inserts/queries then 
having the SQL server write to disk anyway, it just complicates things 
uneccessarily.


My advice would be to take a step back and look at what's important to you. 
Decide which logs need to be monitored in real-time, are there certain criteria 
that require immediate attention? What about alhpa-numeric pager systems, or 
emailed warnings? Are customers going to require reports/information (ie web 
server stats, sendmail relay stats or spam logs, bandwidth usage, etc). It of 
course depends on your overall system, and your users more than anything... but 
in the end I find it's best to work with a mixture of things and hack your own 
scripts to fill in the gaps.


Just my two cents, hope it helps.

--
Nathan Vidican
[EMAIL PROTECTED]
Windsor Match Plate  Tool Ltd.
http://www.wmptl.com/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Log analysis server suggestions? [long]

2006-02-16 Thread Chuck Swiger
Ashley Moran [EMAIL PROTECTED] wrote:
[ ... ]

I'm not sure who the original poster was, but whoever is interested in this
topic might benefit by reading a thread from the firewall-wizards mailing list:

 Original Message 
Subject: [fw-wiz] parsing logs ultra-fast inline
Date: Wed, 01 Feb 2006 16:03:38 -0500
From: Marcus J. Ranum [EMAIL PROTECTED]
To: firewall-wizards@honor.icsalabs.com

OK, based on some offline discussion with a few people, about
doing large amounts of system log processing inline at high
speeds, I thought I'd post a few code fragments and some ideas
that have worked for me in the past.

First off, if you want to handle truly ginormous amounts of log
data quickly, you need to build a structure wherein you're making
decisions quickly at a broad level, then drilling down based on
the results of the decision. This allows you to parallelize infinitely
because all you do is make the first branch in your decision tree
stripe across all your analysis engines. So, hypothetically,
let's say we were handling typical UNIX syslogs at a ginormous
volume, we might have one engine (CPU/process or even a
separate box/bus/backplane/CPU/drive array) responsible for
(sendmail | named) and another one responsible for (apache | imapd)
etc. If you put some simple counters in your analysis routines
(hits versus misses) you can load balance your first tree-branch
appropriately using a flat percentage. Also, remember, if you
standardize your processing, it doesn't matter where it happens;
it can happen at the edge/source or back in a central location
or any combination of the two. Simply design your analysis as
an always 3-stage process consisting of:
- weeding out and counting instances of uninteresting events
- selecting, parsing sub-fields of, and processing interesting events
- retaining events that fell through the first two steps as unusual
The results of these 3 stages are
- a set of event-IDs and counts
- a set of event-IDs and interesting fields and counts
- residual data in raw form
back-haul the event-IDS and counts and fields and graph them or stuff
them into a database, and bring the residual data to the attention of a human
being.

I suppose if you needed to you could implement a log load
balancer in the form of a box that had N interfaces that
collected a fat stream of log data, ran a simple program
that sorted the stream into 1/N sub-streams and forwarded
them to backend engines for more involved processing. You
could scale your logging architecture to very very large
loads this way. It works for Google and it'd work for you, too.

The first phase of processing is to stripe across engines if
necessary, then inside each engine you stripe the processing
into functional sub-parsers that deal with a given message
format. The implementation is language-irrelevant though your
language choice will affect performance. Typically you write a
main loop that looks like:
while ( get a message ) {
if(message is a sendmail message)
parse sendmail message
if(message is an imap message)
parse imap message
...
}

Once your system has run on a sample dataset you will be able
to determine which messages come most frequently and you can
put that test at the top of the loop. This can result in an enormous
performance boost.

Each sub-parse routine follows the same structure as the
main loop, performing a sorted series of checks to sub-parse
the fields of the message-specific formats. I.e.:

parse sendmail message( )  {
if(message is a stat=sent message) {
pull out recipient;
pull out sender;
increment message sent count;
add message size to sender score;
done
}
if(message is a stat=retry message) {
ignore; //done
}
if(message is a whatever) {
whatever;
done
}

// if we fell through to here we have a new message structure
// we have never seen before!!
vector message to interesting folder;
}

Variant messages are a massive pain in the butt; you need to decide
whether to deal with variants as separate cases or to make the
sub-parser smarter in order to deal with them. This is one of the
reasons I keep saying that system log analysis is highly site
specific! If your site doesn't get system logs from a DECStation
5000 running ULTRIX 3.1D then you don't need to parse that data.
Indeed, if you build your parse tree around that notion then, if you
suddenly start getting ULTRIX format log records in your data
stream, that'd be - shall we say - interesting and you want to know
about it. I remember when I was looking at some log data at one
site (Hi Abe!) we found one log message that was about 400K long
on a single line. It appeared that a fairly crucial piece of software
decided to spew a chunk of its memory in a log message, for no
apparent reason. 

Re: Log analysis server suggestions?

2006-02-16 Thread Kurt Buff
Ashley Moran wrote:
 Until recently I had a server running syslog-ng set to archive all logs into 
 server/year/month/day/ directories.  Now the server is running in amd64, 
 we've lost our hi-res scrolling display so I want to look at a better log 
 watching system.
 
 I've read about logging to a database.  I quite like the idea of storing our 
 logs in PostgreSQL (I don't like MySQL and don't want to get involved in 
 administering a second database).  I know I can log to a PG database quite 
 easily, but I don't know how I can get the data back out without writing 
 manual queries.
 
 Here is what I need:
 
 - Logs stored for the last 6 months or so, and easily searchable
 - Live log watching
 - Log analysis
 
 I might try swatch for the live log watching as this is not affected by the 
 choice of log storage and seems the best tool for the job.
 
 As for searching / analysis, I've seen php-syslog-ng 
 ( http://www.vermeer.org/projects/php-syslog-ng ), which looks very basic, 
 and phpLogCon ( http://www.phplogcon.com/ ), which does not support PG 
 anyway.  Is there anything better GUI-wise?
 
 Maybe I am best keeping the logs in text files for now, and spending more 
 time 
 on swatch.
 
 Any thoughts?
 
 Cheers
 Ashley

http://www.loganalysis.org, and the related listserv might be well worth
your time...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Log analysis server suggestions?

2006-02-16 Thread Olivier Nicole
 As for searching / analysis, I've seen php-syslog-ng 
 ( http://www.vermeer.org/projects/php-syslog-ng ), which looks very basic, 
 and phpLogCon ( http://www.phplogcon.com/ ), which does not support PG 
 anyway.  Is there anything better GUI-wise?

As for the log analysis, I remember attending a security seminar where
the conclusion was that a good log analysis system should let you
define what events are unimportant and could be ignored so that all
other events, including the unexepected ones are shown as important
and requiring action.

Best regards,

Olivier
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]