Re: OSS/NM (was CCIE Vs. BS or MS degree [7:60220]

Howard C. Berkowitz Fri, 03 Jan 2003 15:00:31 -0800

At 8:33 PM +0000 1/3/03, bergenpeak wrote:
>Hi Howard,
>
>I'm not suggesting that one should write a book on network management.
>Instead, it seems that most network routing books don't spend anytime
>reviewing some of the key MIB objects relevant to the routing protocol
>that should be considered when configuring the relevant NM tools.


For proprietary reasons, I can't get into all the work I've done in 
this area. Suffice it to say that this sort of real-time verification 
needs its own processors, typically workstation based, much as an IDS 
needs to be on a special processor.  Typically, this would be 
economic only in a carrier environment. There are other strategies 
for making extensive use of multiprocessing in routers that by and 
large have not been widely discussed, although you can see some shape 
of things in the IRTF and the IETF FORCES working group, as well as 
some things from the pure researchers.

>
>It does seem naive thinking that one could "design it right in the first
>place" and then not have to worry about network operations as if it's
>not needed.

Perhaps I overstated. I'm not suggesting that one can design 
error-free operational networks.  I am suggesting that better, more 
abstract, design and provisioning tools can make operations much more 
reliable.  Some of this is trivial, such as using macros and 
databases to generate configs and then load them.  For some examples, 
see my presentation at NANOG, 
http://www.nanog.org/mtg-9811/ppt/berk/index.htm.  I have a slightly 
updated version given at the ARIN October 1999 meeting, but I have to 
get the new URL.

>Maybe this is possible, if the gear being deployed never
>has a
>hardware failure, the OS never fails, your fiber never gets dug up, and
>device misconfigurations never happen.

You can get into such things as redundant processors with majority 
voter logic, topology checkers for dealing with Byzantine corruption, 
etc.  Again, not economically feasible for low-end enterprise gear.

>    If you are seeing gear which
>never fails, a carrier which never loses fiber, and operations folks who
>never
>make mistakes, let me know what vendors I should be switching too or
>entity I should be hiring from...  :-)

There's much wisdom both in military and traditional telco networks 
designed to degrade rather than hard-fail.  Some interesting papers 
on survivability are at the Carnegie Mellon Software Engineering 
Institute, reachable off the www.cert.org webpage.

>
>In a post yesterday, you mentioned CALEA and E911. Good, lets think
>about primary line VOIP and OSPF as your IGP.    Lets assume that
>customer
>downtime for VOIP is a bad thing and something the operator is tryng to
>avoid. Thus, it's crucial for the NM folks to be able to detect problems
>before
>pagers start buzzing and before the call center gets whacked....
>
>Given this, how can  NM tools determine that all links which should have
>OSPF adjacencies active in fact do?   I've seen situations where this
>sort of
>problem doesn't get realized until there's a failure in one part of the
>network.

One trick I've used, which is really not much more than a hack, is 
periodically dumping the LSDB and doing diffs against consecutive 
shots, with some extra code that rules out false alarms due to known 
changes, can be simpler than it looks -- but I've rarely seen anyone 
do it.  Even keeping a weekly hard copy of the LSDB at the NOC can be 
enormously helpful.

>  The backup path with the adjcancey problem, but which wasn't
>needed used during normal operation, then causes an outage.   There are
>OIDs in the OSPF MIB or syslog messages which one can use to help
>determine
>when an adjacency is improperly down, but this information is not
>covered in
>the standard network book.

A lot of mechanisms for this are evolving in the (G)MPLS failover 
work.  See, for example:

http://www.ietf.org/internet-drafts/draft-ietf-mpls-recovery-frmwrk-08.txt 
http://www.ietf.org/internet-drafts/draft-ietf-mpls-bundle-04.txt
http://www.ietf.org/internet-drafts/draft-ietf-mpls-rsvp-lsp-fastreroute-01.txt
http://www.ietf.org/internet-drafts/draft-ietf-mpls-lsp-ping-01.txt


>Sure, knowing "debug ip ospf XYZ" commands
>is a
>start, and useful for newbies, but there's more to support than running
>debug
>commands, and there's always the risk that you've just blown up the
>router you
>turned debug on....
>
>And as you mention, there are things that would be useful to know
>through the MIB, but which aren't currently supported.  Doesn't mean
>they're not
>worth talkng about.  One item that I ran into was related to the use of
>"auto-cost reference bandwidth" to change the metric used to cost out
>links. It's important that all devices use the same reference bandwidth
>in
>order for costs to be properly computed.  How does one verify all
>devices,
>across vendors, are using the same reference bandwidth?  Turns out that
>this
>one is not possible via the OSPF MIB as it stands today as the reference
>bandwidth is not an object in the MIB, but is just a *comment* in the
>MIB
>definition.

And the comment about bandwidth doesn't even make it into the protocol RFC.

>
>Much like NRF mentioned which lead me to spin this new thread-- as NM
>tools get more sophisticated, there will be less need for the CCNX
>support
>engineer who carries a pager to figure out problems in the middle of the
>night.
>Instead more and more of the opertional support work will be done up
>front as
>part of the design engineering and this will include the OIDs and
>thresholds
>the NM folks and tools should be monitoring.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=60252&t=60220
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

Re: OSS/NM (was CCIE Vs. BS or MS degree [7:60220]

Reply via email to