298 Self Review]

Garrett D'Amore Tue, 29 May 2007 14:02:50 -0700

John Plocher wrote:
> Garrett D'Amore wrote:
>> Also, right now, these applications will see the link up/down 
>> messages in the syslogs, and they can take whatever action they feel 
>> is appropriate when the syslogs indicate that such action is necessary.
>
>
> I think you are missing the point.
>
> Sure, one could do all that extra work to figure out
> what is really happening after they see an obfuscated
> and truncated teaser message in syslog.  But why should
> we go out of our way to make things harder for admins?
> Better that we ensure that we record useful and relevant
> info, especially when we have it at our fingertips at
> the time.
>
> What is syslog for?  Why do we even log messages - heck,
> if a sysadmin really wanted to know about the state of
> the system, they could just use dladm, ifconfig, ping,
> traceroute, snoop, SNMP traps, dtrace and truss.  After
> all, if you don't know how to use those tools, you should
> be banished to a Linux system where you are forced to
> use GNUtar with Bash :-)


There are very friendly tools built upon SNMP.  Please don't place using 
DTrace and SNMP in the same bucket.

In the Linux world, they have a dladm-like tool as well, its called 
ethtool.  And the link status logging in the Linux world is even _more_ 
of a mess than it is here.  Again, lets not go there.


>
> Have you ever used Splunk (www.splunk.com) to manage a
> system?  If not, stop reading this and go get a free
> copy and play with it.  It isn't the only or best way
> to admin complex systems, but it shows what can be done
> with simple logfile analysis.  Combining web logs with
> syslog (with ...) lets me do the root cause analysis to
> find out WHY a problem is happening - even if the problem
> itself wasn't obvious at the time.

I've used tools like Splunk, though not Splunk itself.

Yes, logfile analysis can be very very useful, but usually that is a 
last resort when the *up front* tools can't tell you what is going on, 
or for sites that don't use or don't want to use an architected solution 
based on SNMP.

Basically, by not providing good tools up-front to solve these problems 
(via SNMP, CLI apps, or whatever), we have created the market for 
Splunk.  Splunk fills a gap in our story, which I think we should fix by 
closing the gap, rather than continuing to build upon that gap.

>
> Please find a way to keep this useful info in the network
> interface messages.  The cost of putting it there will be
> significantly (IMHO several orders of magnitude) smaller
> than the time and effort saved at only one customer site
> by having it there.

I'm seriously, seriously skeptical of that.  As I said, to fix even 
_one_ problem, you shouldn't need historical data.   All you need is a 
snapshot of the current state to fix whatever the problem is.

Unless you believe that link state changes are a root cause for other 
problems, and you want to point the finger at link state changes for 
those other problems....  I'm not sure what those would look like.

>
> Yes, this implies that the network drivers will all need
> a similar set of kstats (or whatever) for this to work;
> also a small effort when compared to the benefits.

GLDv3 has that for network drivers of a common media type.  Its less 
uniform for pre-GLDv3 and for non-802.3 media.

    -- Garrett

GLDv3 link status logging [PSARC/2007/298 Self Review]

Reply via email to