Hi Clayton > I need input on: > 1. Number of devices you are managing logs for (large scale being over > 10,000 devices)
We're currently managing logs for ~200 network devices and another 500-600 servers using syslog-ng; perhaps not big enough, but who knows ;) > 2. What log levels you are sending from the devices ( i.e. 0-6 for normal > operation, 0-7 when troubleshooting?) We have a central loghost in each physical site which accepts and queues logs, forwarding them onto our master box as long as it's available. On the master box, I have a piece of Ruby which filters out common garbage we're unable to remove; mostly this is legacy software which logs far too much to inappropriate levels and we don't care about much any more. In normal operating mode, we actively track 0-5; when we're tracing a problem you can flip the mangler into a more verbose mode where it inserts 0-6 into MySQL -- typically we don't look at the DEBUG level at all, since if you're working with an application in that much depth you might as well configure it to log locally for a while :) > 3. What log levels you are reacting on (if not all). Stated above. > 4. How many people are assigned to look at log messages Nobody is specifically assigned to the task, the severity of the problem determines the response - generally speaking it just creates a ticket which someone on support rotation can take action upon, really urgent stuff triggers SMS messages. > 5. What program(s) are used to do log analysis Bespoke bits and pieces written in Ruby. We have a very interesting 'attack analysis' module in development which scans for common dictionary-based attacks and, subject to certain conditions being met, should be able to null route persistent buggers. > 6. How are you analyzing the logs? Are you doing a baseline analysis (based > on number of events per device) or are you reacting on every incoming > message...or do you just ignore them because there are too many to look at? Baseline analysis to detect host downtime, ie: we specify a minimum number of INSERT queries per second we expect to see generated by each host. Apart from that, messages are filtered and classified after insertion. > 7. Anything I missed? Don't think so :) I suppose my only other comment is that we mostly use syslog-ng and PHP-Syslog-NG in a reactive fashion, to track down the cause of problems and assess the extent of any potential issue - active monitoring is handled by Nagios. Best Regards, Alex ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Php-syslog-ng-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/php-syslog-ng-support

