Darren J Moffat wrote:
> Garrett D'Amore wrote:
>> Actually, as I've already indicated, apart from being able to lay 
>> blame on person who changed the settings, I don't understand why they 
>> are useful _in the logs_.  I may just be dense.
>
> Given that these are syslog logs one maybe viewing the log entry on a 
> host other than the one that has that link (it might be an SMS message 
> on your phone or an email), so it might not be trivial to run dladm to 
> see what the current status actually is.
>

SMS forwarding of your syslog contents... gak!  if this is common 
practice nowadays, then I"m really glad I quit being a sysadmin about a 
decade ago.

The typical syslog in a big network tends to have a lot of activity, 
more than I'd want to be sent to me via SMS or e-mail, unless it is 
filtered or consolidated somehow.

The more I think about it, the more I realize that this whole mess 
really comes about because some geniuses insist on still forcing link 
speed and duplex instead of letting 802.3u autonegotiation do its job.  
I guess Sun contributed to the current mess at one time by shipping 
buggy 100Mbit implementations that didn't autonegotiate/NWay properly.

I suspect that the problems that people are mostly concerned about are 
where the link duplex is incorrectly set.  Again, probably because one 
side is trying to autonegotiate, but the other side is set to 100 full 
fuplex, forced, with 802.3u autonegotiation specifically disabled.   The 
network engineers that insist on continuing to do this should probably 
be thwacked, but that's out-of-scope for this case. :-)  I doubt the 
speed selection is nearly so much a problem.

In any case, there is nothing inherently bad about half duplex (other 
than it may be performance limiting to a certain extent), as long as 
_both_ sides agree on the duplex setting.  It is perfectly reasonable to 
have half duplex negotiated when a hub is inserted into the link, for 
example.

The thing is, when you have it misconfigured, usually you'll be able to 
tell by, for example, getting collisions on a full-duplex link, or 
getting late collisions on a half duplex link.  This situation can 
easily be correlated, and is a far better indication than just naively 
looking at the duplex state alone.

And this is precisely the kind of analysis that FMA should be doing for 
the customer, rather than just leaving breadcrumbs in the syslogs.  When 
FMA figures out that a misconfiguration likely exists, then _it_ can log 
the specific analysis rather than just this random clue "hey your link 
changed".  (One could even try to be clever and have FMA self-repair 
this situation!)

One other concern about logging link speed, is that as we move to 
greener systems, there is the idea that systems could auto-tune their 
link speed to save power.  In low usage times, 10 or 100Mbit speeds 
consume a _lot_ less power than running a full 1G link.  I suspect it is 
even more dramatic with 10G.  The idea is to auto-tune based on 
cpu/network load.  So one can imagine that even the link speeds might be 
altered without system administrator intervention.  (Although that 
requires both link partners to properly support 802.3u autonegotiation!)

So, here's the quick summary of pros/cons for logging extended details:

Pro for logging extended link state data:

    * admins are used to it for most ethernet drivers
    * syslog is apparently more accessible for some administrators than 
the CLI tools for kstat/dladm
    * syslog allows historical data about link settings to be recorded

Cons for logging extended link state data:

    * differences between link media/link types
    * probably not reasonable to log data for non-802.3 links
    * enables continued (ab)use of the syslog facility for programmatic 
notification
    * link data may be relatively volatile (e.g. power management 
changing speeds)
    * by itself, link duplex/state data is inadequate to properly 
diagnose faults
    * makes the framework aware of link media in ways that it is not 
already aware
    * duplicates (for non-historical uses) data already available via 
preferred kstat/dladm APIs

I still really, really think continuing to log the link speed and duplex 
data is a bad idea.  I'm starting to think logging _anything_ is 
questionable, but I understand that the cable connection events are 
interesting to watch in syslog.

I'd really, really like to see folks agree to start with syslogging only 
link up/down (no speed/duplex), and follow it up with a proper FMA 
analysis of duplex related link errors... that we can actually perform a 
more meaningful analysis, and either offer or actually perform whatever 
corrective action is required.

Ultimately, no matter _what_ we wind up doing, there are going to be a 
large number of people unhappy.

Some vociferous users complain that we spam the logs, even as it is, and 
don't want more data there than absolutely necessary  ... they'd be 
happiest with no syslog data at all for link events.  Some vociferous 
users want even _more_ data in the logs, and complain about any 
reduction of the data that is there .... indeed some probably want more 
detail than we even provide today (such as 802.11 link details.)  Some 
vociferous users/developers complain that the logged data is 
inconsistent, unparseable, if we do nothing to address it.

Given that nobody (or at least very few) are going to be happy with 
anything we do (or do not do) in the short run, I'm inclined to follow 
the path that gives the greatest long term wins.  I believe that the 
path suggested above (take the case as originally filed, and follow up 
with FMA analysis later) is architecturally the strongest, takes us in a 
direction we need to go, and carries a minimum of baggage with it.

Ultimately at this point, the decision isn't mine though.  But that's my 
recommendation.

    -- Garrett

Reply via email to