Re: Teaching/developing troubleshooting skills

2004-06-29 Thread Bruce Pinsky
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
[EMAIL PROTECTED] wrote:
|>>It's also important that one avoid:
|>>
|>>* The faulty assumption there is but one problem
|
|
| Here's an interesting example that I came across
| several years ago. It was in an office with lots
| of PCs plugged into RJ45 10baseT ports near each desk.
| One PC had lost connectivity.
|
| I came and checked that the software was
| installed and running. Probably did something
| like ping 127.0.0.1 to satisfy myself that it
| wasn't a problem on the PC itself. Then I unplugged
| the cable from the RJ45 port in the wall and tried
| another port. It still did not work. I swapped
| in a new cable and it worked fine.
|
| Most people would stop right there, but I
| followed up and tested the existing cable
| in the lab. It worked just fine. Why did
| it not work before? There must be some problem
| with the switch or the wall wiring and somehow
| two RJ45 ports did not work. After a bit of
| poking and discussions with the employee at
| that desk, it turned out that the cable lay
| in a bad spot and often got caught on her foot
| as she rushed off somewhere. It turns out that
| the little metal pins inside the RJ45 socket
| had been bent. It was just sheer luck that
| swapping the cable caused contact to be made again.
| And the second socket was also bent. When that
| one ceased to work the employee had swapped
| cables themselves.
|
| The real solution was to replace both sockets
| and install a longer patch cable that could be
| placed where feet would not get caught up in it.
|
| Troubleshooting is made easier by methodically
| doing the work and following through. If I had
| not had the lab handy I probably would have
| swapped the "bad " cable back in to verify that
| "trouble" accompanied the cable. But it is also
| easier to troubleshoot when you have a stock of
| interesting war stories in your memory to encourage
| you to "think outside the box". It's the blend of
| creativity and methodical work practices that makes
| a good troubleshooter, technical or otherwise.
|
You've described Closed Loop Corrective Action to the tee.  It's not enough
to know what the problem is, but how to correct it, and what to do to
prevent it in the future.
- --
=
bep
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (MingW32)
iD8DBQFA4c0KE1XcgMgrtyYRArh6AJ9yOTkxGOv7iloTegO/DtUENYXmygCgiNnO
m6XSOg2EPejbV4ZqOHvmPO0=
=AwT9
-END PGP SIGNATURE-


Re: Teaching/developing troubleshooting skills

2004-06-28 Thread Eric Brunner-Williams

>* The faulty assumption there is but one problem
>* Incorrectly-formed causal relationships

Mythology.

Some may recall the adventures of the CTO who ran a sweep of an net 10.*
in a rather modest machine room somewhere in Maine, resulting in memory
exhaustion (arp table) in the router, which resulted in 1918 leakage into
public address space.

The operational mythology of the ever-so-security-minded-yolks was that
the initial and very poorly understood presenting problem was an external
act of malice, rather than self-inflicted DoS by the security-yolk itself.

I've seen many people struggle to fit what little they know into predefined
mythos of what could be happening, rather than starting like Sgt. Schultz,
who "knew nothing", at least until he really _knew_ it.

Eric


Re: Teaching/developing troubleshooting skills

2004-06-28 Thread Michael . Dillon

> >It's also important that one avoid:
> >
> >* The faulty assumption there is but one problem

Here's an interesting example that I came across
several years ago. It was in an office with lots
of PCs plugged into RJ45 10baseT ports near each desk.
One PC had lost connectivity.

I came and checked that the software was
installed and running. Probably did something
like ping 127.0.0.1 to satisfy myself that it
wasn't a problem on the PC itself. Then I unplugged
the cable from the RJ45 port in the wall and tried
another port. It still did not work. I swapped
in a new cable and it worked fine.

Most people would stop right there, but I
followed up and tested the existing cable 
in the lab. It worked just fine. Why did
it not work before? There must be some problem
with the switch or the wall wiring and somehow
two RJ45 ports did not work. After a bit of 
poking and discussions with the employee at
that desk, it turned out that the cable lay
in a bad spot and often got caught on her foot
as she rushed off somewhere. It turns out that
the little metal pins inside the RJ45 socket
had been bent. It was just sheer luck that
swapping the cable caused contact to be made again.
And the second socket was also bent. When that
one ceased to work the employee had swapped 
cables themselves.

The real solution was to replace both sockets 
and install a longer patch cable that could be 
placed where feet would not get caught up in it.

Troubleshooting is made easier by methodically
doing the work and following through. If I had
not had the lab handy I probably would have
swapped the "bad " cable back in to verify that
"trouble" accompanied the cable. But it is also
easier to troubleshoot when you have a stock of
interesting war stories in your memory to encourage
you to "think outside the box". It's the blend of
creativity and methodical work practices that makes
a good troubleshooter, technical or otherwise.

--Michael Dillon



Re: Teaching/developing troubleshooting skills

2004-06-28 Thread John Neiberger

>It's also important that one avoid:
>
>* The faulty assumption there is but one problem
>* Incorrectly-formed causal relationships (NANOG-L has some
>  examples of these)
>* Making too many changes in one iteration
>* Attempting to tackle a system with more unknowns than are
>  absolutely necessary.

These words should be hanging on a wall in every IT department. You
wouldn't believe how many times I've had to gently correct someone
because of these mistakes, particularly the first two. 

John
--


Re: Teaching/developing troubleshooting skills

2004-06-26 Thread Edward B. Dreger

DG> Date: Fri, 25 Jun 2004 20:04:38 -0700
DG> From: Darrell Greenwood

[ editted for brevity ]

DG> The 5 day course can be boiled down really to one concept
DG> that can be taught in 5 minutes... "binary search".

Every half-decent programmer knows O(log(N)) is one's friend
unless the scalar coefficient is large.  A good way to
demonstrate its efficiency is:

* Have someone pick an integer between 1 and n, inclusive
* Make guesses, going "higher" or "lower" according to the
  number-holder's feedback.

The uninformed are surprised that one can always guess the number
from 1 to 1000 in ten iterations or less.


DG> The reason I am writing this note is as I went through a
DG> career of troubleshooting I was surprised at the number of
DG> colleagues who had no concept of "half-splitting" and used
DG> "linear" or "random" techniques to determine test
DG> points/tests with a corresponding dramatic reduction in
DG> effectiveness.

Good point.


[ below text in response to nobody in particular ]

It's also important that one avoid:

* The faulty assumption there is but one problem
* Incorrectly-formed causal relationships (NANOG-L has some
  examples of these)
* Making too many changes in one iteration
* Attempting to tackle a system with more unknowns than are
  absolutely necessary.

A certain amount of troubleshooting can be taught, but IMHO it
requires a self-driven person with intuitive reasoning.

Finally: Apprenticeship.  Have the novices follow along when
experts work actual cases.  A certain amount of troubleshooting
is developing the intuition to make informed guesses -- e.g.,
"some idiot broke pmtud" -- and develop good leads without having
to search methodically through the entire problem space.


Eddy
--
EverQuick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita
_
DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.



Re: Teaching/developing troubleshooting skills

2004-06-25 Thread Darrell Greenwood

On 04/6/24 at 5:09 PM -0600, Pete Kruckenberg wrote the following :

>I'm working on trying to teach others in my group (usually
>less-experienced, but not always) how to improve their
>large-network troubleshooting skills (the techniques of
>isolating a problem, etc)

I took a 5 day course in another era, fortunately for me at the
beginning of my career, on analytic trouble shooting (Kepner Tregoe?).

The 5 day course can be boiled down really to one concept that can be
taught in 5 minutes... 'half-splitting'. (The other 4.95 days were
spent making sure we understood the concept and learning to implement
it, the length of time was overkill but the course vendor had to make
money somehow :-)

(In another discussion* it was pointed out to that that wasn't the
correct name in the writers view... it was "binary search". Google
may have proved the writer correct, but I still refer to it as half
splitting as I spent a week learning to call it that :-)

The point of this note is troubleshooting boils down finding the
problem in the fewest steps.

Half-splitting ensures the number of steps are at a minimum.

The troubleshooters knowledge of the system and equipment provides
the ability to devise tests at the half-splitting point.
Half-splitting is a 5 minute concept. System/equipment knowledge is
of course a lifetime endeavor.

The reason I am writing this note is as I went through a career of
troubleshooting I was surprised at the number of colleagues who had
no concept of "half-splitting" and used "linear" or "random"
techniques to determine test points/tests with a corresponding
dramatic reduction in effectiveness.

Cheers,

Darrell

*p.s., I just remembered where my previous discussion was;

http://db.tidbits.com/tbtalk/tlkmsg.lasso?MsgID=15775

http://db.tidbits.com/tbtalk/tlkmsg.lasso?MsgID=15787

http://db.tidbits.com/tbtalk/tlkmsg.lasso?MsgID=15788

Searching with Google for "half-splitting" will produce some useful hits.


Re: Teaching/developing troubleshooting skills

2004-06-25 Thread John Neiberger

>>> Pete Kruckenberg <[EMAIL PROTECTED]> 6/24/04 5:09:19 PM >>>
>It's been so long since I learned network troubleshooting
>techniques I can't remember how I learned them or even how I
>used to do it (so poorly).
>
>Does anyone have experience with developing a
>skills-improvement program on this topic?

I find that it's helpful to teach troubleshooting in two stages: 1)
Define the problem. 2) Isolate the problem

For stage one, teach them the basic skillset needed to define the issue
in a general way based on available information. Is a circuit obviously
down? Are certain destinations unreachable? Are *all* destinations
unreachable? Is network access slow? You get the picture.

Once the nature of the problem is determined, I find that a layered
approach to troubleshooting is helpful and that is what I teach to
others. The exact order of steps might changed based on information
learned in step one, but generally I work my way up the OSI model.

If the problem could possibly be caused by a physical layer issue, try
to determine such. Check the circuits for errors, bouncing links,
indications of mismatched clocking configurations, faulty CSU/DSUs, 
faulty router interfaces, or bad cabling. If all of that appears to be
okay then I consider the datalink layer.

Could the problem defined in step one be caused by a datalink layer
issue? Was the encapsulation changed on a router interface? If frame
relay, is the router seeing LMI from the frame relay switch? Is there
evidence of dropped frames completely within the cloud (granted, that's
not necessarily datalink layer, but it is a separate 'administrative'
layer if it's out of your control.) I'm sure you can think of a number
of other examples.

Could the problem defined in step one be the results of a network layer
issue? Is there evidence of a routing loop? Do the devices involved have
routing tables that appear to be correct? What do traceroutes and pings
show? Teach them to go hop-by-hop and verify that everything appears as
it should, starting with the device closest to the problem if it's
possible to narrow it down that far.

If routing is determined to be correct, could this be a transport layer
issue? Is it possible that an access list or firewall somewhere is
blocking only certain types of traffic? Does the problem only involve
HTTP? SMTP? Is there policy routing involved that might be redirecting
certain types of traffic to the wrong destination? Where there *any*
recent configuration changes? If so, what were they? Find out, because
they might be the cause of the problem.

This is the general framework I use for troubleshooting and that's how
I've taught the people that work with me. It's constantly evolving and,
of course, the specific steps taken depend on the nature of the issue,
but I find that it helps to have a good foundational troubleshooting
framework.

John
--


Re: Teaching/developing troubleshooting skills

2004-06-24 Thread Bruce Pinsky
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Pete Kruckenberg wrote:
| I'm working on trying to teach others in my group (usually
| less-experienced, but not always) how to improve their
| large-network troubleshooting skills (the techniques of
| isolating a problem, etc).
|
| It's been so long since I learned network troubleshooting
| techniques I can't remember how I learned them or even how I
| used to do it (so poorly).
|
| Does anyone have experience with developing a
| skills-improvement program on this topic? If you've tried
| such a thing, what worked/didn't work for you? Outside
| training? Books? Mentoring? Motivational posters?
|
| I'm particularly sensitive to the "I got my CCNA, therefore
| I know everything there is to know about troubleshooting"
| perspective, and how to encourage improving troubleshooting
| skills without making it insultingly basic.
|
If you are looking for some courses on just analytical troubleshooting
and/or problem solving techniques, you might want to look at the Kepner
Tregoe stuff (www.kepner-tregoe.com).  It is not network specific but
rather teaches techniques.  Some of their courses include:
Problem Solving and Decision Making
Analytic Trouble Shooting
Implementing Corrective and Preventive Actions
- --
=
bep
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (MingW32)
iD8DBQFA23J8E1XcgMgrtyYRAun6AKCmtmTkq8Pyq5xYBud478424x67kACeP6w9
uBUJo/El3rVXRC7TBkpb2DA=
=q+YH
-END PGP SIGNATURE-


Re: Teaching/developing troubleshooting skills

2004-06-24 Thread Jon R. Kibler
Pete Kruckenberg wrote:
> 
> I'm working on trying to teach others in my group (usually
> less-experienced, but not always) how to improve their
> large-network troubleshooting skills (the techniques of
> isolating a problem, etc).

There are several vendors that offer these types of courses, and I am sure that if you 
search for courseware, you can find some good materials you could use to teach your 
own sessions in house.

Jon
-- 
Jon R. Kibler
Chief Technical Officer
A.S.E.T., Inc.
Charleston, SC  USA
(843) 849-8214




==
Filtered by: TRUSTEM.COM's Email Filtering Service
http://www.trustem.com/
No Spam. No Viruses. Just Good Clean Email.



RE: Teaching/developing troubleshooting skills

2004-06-24 Thread Larry Pingree

Hi Pete,
If you have a test lab, a good thing would be to setup a
complete functional network. Show the engineer how it's configured. Then
have them leave the room and then break it. Send them back in to look at
what is wrong. As they move through the process, help them by guiding
them through the troubleshooting process in a mentoring fashion, help
them analyze and break apart the problem.

LP
 
Best Regards,
 
Larry
 
Larry Pingree

"Visionary people, are visionary, partly because of the great many
things they never get to see." - Larry Pingree

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Pete Kruckenberg
Sent: Thursday, June 24, 2004 4:09 PM
To: [EMAIL PROTECTED]
Subject: Teaching/developing troubleshooting skills


I'm working on trying to teach others in my group (usually
less-experienced, but not always) how to improve their
large-network troubleshooting skills (the techniques of
isolating a problem, etc).

It's been so long since I learned network troubleshooting
techniques I can't remember how I learned them or even how I
used to do it (so poorly).

Does anyone have experience with developing a
skills-improvement program on this topic? If you've tried
such a thing, what worked/didn't work for you? Outside
training? Books? Mentoring? Motivational posters?

I'm particularly sensitive to the "I got my CCNA, therefore
I know everything there is to know about troubleshooting"  
perspective, and how to encourage improving troubleshooting
skills without making it insultingly basic.

Thanks for your help.
Pete.



Teaching/developing troubleshooting skills

2004-06-24 Thread Pete Kruckenberg

I'm working on trying to teach others in my group (usually
less-experienced, but not always) how to improve their
large-network troubleshooting skills (the techniques of
isolating a problem, etc).

It's been so long since I learned network troubleshooting
techniques I can't remember how I learned them or even how I
used to do it (so poorly).

Does anyone have experience with developing a
skills-improvement program on this topic? If you've tried
such a thing, what worked/didn't work for you? Outside
training? Books? Mentoring? Motivational posters?

I'm particularly sensitive to the "I got my CCNA, therefore
I know everything there is to know about troubleshooting"  
perspective, and how to encourage improving troubleshooting
skills without making it insultingly basic.

Thanks for your help.
Pete.