Re: Teaching/developing troubleshooting skills
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: |>>It's also important that one avoid: |>> |>>* The faulty assumption there is but one problem | | | Here's an interesting example that I came across | several years ago. It was in an office with lots | of PCs plugged into RJ45 10baseT ports near each desk. | One PC had lost connectivity. | | I came and checked that the software was | installed and running. Probably did something | like ping 127.0.0.1 to satisfy myself that it | wasn't a problem on the PC itself. Then I unplugged | the cable from the RJ45 port in the wall and tried | another port. It still did not work. I swapped | in a new cable and it worked fine. | | Most people would stop right there, but I | followed up and tested the existing cable | in the lab. It worked just fine. Why did | it not work before? There must be some problem | with the switch or the wall wiring and somehow | two RJ45 ports did not work. After a bit of | poking and discussions with the employee at | that desk, it turned out that the cable lay | in a bad spot and often got caught on her foot | as she rushed off somewhere. It turns out that | the little metal pins inside the RJ45 socket | had been bent. It was just sheer luck that | swapping the cable caused contact to be made again. | And the second socket was also bent. When that | one ceased to work the employee had swapped | cables themselves. | | The real solution was to replace both sockets | and install a longer patch cable that could be | placed where feet would not get caught up in it. | | Troubleshooting is made easier by methodically | doing the work and following through. If I had | not had the lab handy I probably would have | swapped the "bad " cable back in to verify that | "trouble" accompanied the cable. But it is also | easier to troubleshoot when you have a stock of | interesting war stories in your memory to encourage | you to "think outside the box". It's the blend of | creativity and methodical work practices that makes | a good troubleshooter, technical or otherwise. | You've described Closed Loop Corrective Action to the tee. It's not enough to know what the problem is, but how to correct it, and what to do to prevent it in the future. - -- = bep -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (MingW32) iD8DBQFA4c0KE1XcgMgrtyYRArh6AJ9yOTkxGOv7iloTegO/DtUENYXmygCgiNnO m6XSOg2EPejbV4ZqOHvmPO0= =AwT9 -END PGP SIGNATURE-
Re: Teaching/developing troubleshooting skills
>* The faulty assumption there is but one problem >* Incorrectly-formed causal relationships Mythology. Some may recall the adventures of the CTO who ran a sweep of an net 10.* in a rather modest machine room somewhere in Maine, resulting in memory exhaustion (arp table) in the router, which resulted in 1918 leakage into public address space. The operational mythology of the ever-so-security-minded-yolks was that the initial and very poorly understood presenting problem was an external act of malice, rather than self-inflicted DoS by the security-yolk itself. I've seen many people struggle to fit what little they know into predefined mythos of what could be happening, rather than starting like Sgt. Schultz, who "knew nothing", at least until he really _knew_ it. Eric
Re: Teaching/developing troubleshooting skills
> >It's also important that one avoid: > > > >* The faulty assumption there is but one problem Here's an interesting example that I came across several years ago. It was in an office with lots of PCs plugged into RJ45 10baseT ports near each desk. One PC had lost connectivity. I came and checked that the software was installed and running. Probably did something like ping 127.0.0.1 to satisfy myself that it wasn't a problem on the PC itself. Then I unplugged the cable from the RJ45 port in the wall and tried another port. It still did not work. I swapped in a new cable and it worked fine. Most people would stop right there, but I followed up and tested the existing cable in the lab. It worked just fine. Why did it not work before? There must be some problem with the switch or the wall wiring and somehow two RJ45 ports did not work. After a bit of poking and discussions with the employee at that desk, it turned out that the cable lay in a bad spot and often got caught on her foot as she rushed off somewhere. It turns out that the little metal pins inside the RJ45 socket had been bent. It was just sheer luck that swapping the cable caused contact to be made again. And the second socket was also bent. When that one ceased to work the employee had swapped cables themselves. The real solution was to replace both sockets and install a longer patch cable that could be placed where feet would not get caught up in it. Troubleshooting is made easier by methodically doing the work and following through. If I had not had the lab handy I probably would have swapped the "bad " cable back in to verify that "trouble" accompanied the cable. But it is also easier to troubleshoot when you have a stock of interesting war stories in your memory to encourage you to "think outside the box". It's the blend of creativity and methodical work practices that makes a good troubleshooter, technical or otherwise. --Michael Dillon
Re: Teaching/developing troubleshooting skills
>It's also important that one avoid: > >* The faulty assumption there is but one problem >* Incorrectly-formed causal relationships (NANOG-L has some > examples of these) >* Making too many changes in one iteration >* Attempting to tackle a system with more unknowns than are > absolutely necessary. These words should be hanging on a wall in every IT department. You wouldn't believe how many times I've had to gently correct someone because of these mistakes, particularly the first two. John --
Re: Teaching/developing troubleshooting skills
DG> Date: Fri, 25 Jun 2004 20:04:38 -0700 DG> From: Darrell Greenwood [ editted for brevity ] DG> The 5 day course can be boiled down really to one concept DG> that can be taught in 5 minutes... "binary search". Every half-decent programmer knows O(log(N)) is one's friend unless the scalar coefficient is large. A good way to demonstrate its efficiency is: * Have someone pick an integer between 1 and n, inclusive * Make guesses, going "higher" or "lower" according to the number-holder's feedback. The uninformed are surprised that one can always guess the number from 1 to 1000 in ten iterations or less. DG> The reason I am writing this note is as I went through a DG> career of troubleshooting I was surprised at the number of DG> colleagues who had no concept of "half-splitting" and used DG> "linear" or "random" techniques to determine test DG> points/tests with a corresponding dramatic reduction in DG> effectiveness. Good point. [ below text in response to nobody in particular ] It's also important that one avoid: * The faulty assumption there is but one problem * Incorrectly-formed causal relationships (NANOG-L has some examples of these) * Making too many changes in one iteration * Attempting to tackle a system with more unknowns than are absolutely necessary. A certain amount of troubleshooting can be taught, but IMHO it requires a self-driven person with intuitive reasoning. Finally: Apprenticeship. Have the novices follow along when experts work actual cases. A certain amount of troubleshooting is developing the intuition to make informed guesses -- e.g., "some idiot broke pmtud" -- and develop good leads without having to search methodically through the entire problem space. Eddy -- EverQuick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita _ DO NOT send mail to the following addresses: [EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked.
Re: Teaching/developing troubleshooting skills
On 04/6/24 at 5:09 PM -0600, Pete Kruckenberg wrote the following : >I'm working on trying to teach others in my group (usually >less-experienced, but not always) how to improve their >large-network troubleshooting skills (the techniques of >isolating a problem, etc) I took a 5 day course in another era, fortunately for me at the beginning of my career, on analytic trouble shooting (Kepner Tregoe?). The 5 day course can be boiled down really to one concept that can be taught in 5 minutes... 'half-splitting'. (The other 4.95 days were spent making sure we understood the concept and learning to implement it, the length of time was overkill but the course vendor had to make money somehow :-) (In another discussion* it was pointed out to that that wasn't the correct name in the writers view... it was "binary search". Google may have proved the writer correct, but I still refer to it as half splitting as I spent a week learning to call it that :-) The point of this note is troubleshooting boils down finding the problem in the fewest steps. Half-splitting ensures the number of steps are at a minimum. The troubleshooters knowledge of the system and equipment provides the ability to devise tests at the half-splitting point. Half-splitting is a 5 minute concept. System/equipment knowledge is of course a lifetime endeavor. The reason I am writing this note is as I went through a career of troubleshooting I was surprised at the number of colleagues who had no concept of "half-splitting" and used "linear" or "random" techniques to determine test points/tests with a corresponding dramatic reduction in effectiveness. Cheers, Darrell *p.s., I just remembered where my previous discussion was; http://db.tidbits.com/tbtalk/tlkmsg.lasso?MsgID=15775 http://db.tidbits.com/tbtalk/tlkmsg.lasso?MsgID=15787 http://db.tidbits.com/tbtalk/tlkmsg.lasso?MsgID=15788 Searching with Google for "half-splitting" will produce some useful hits.
Re: Teaching/developing troubleshooting skills
>>> Pete Kruckenberg <[EMAIL PROTECTED]> 6/24/04 5:09:19 PM >>> >It's been so long since I learned network troubleshooting >techniques I can't remember how I learned them or even how I >used to do it (so poorly). > >Does anyone have experience with developing a >skills-improvement program on this topic? I find that it's helpful to teach troubleshooting in two stages: 1) Define the problem. 2) Isolate the problem For stage one, teach them the basic skillset needed to define the issue in a general way based on available information. Is a circuit obviously down? Are certain destinations unreachable? Are *all* destinations unreachable? Is network access slow? You get the picture. Once the nature of the problem is determined, I find that a layered approach to troubleshooting is helpful and that is what I teach to others. The exact order of steps might changed based on information learned in step one, but generally I work my way up the OSI model. If the problem could possibly be caused by a physical layer issue, try to determine such. Check the circuits for errors, bouncing links, indications of mismatched clocking configurations, faulty CSU/DSUs, faulty router interfaces, or bad cabling. If all of that appears to be okay then I consider the datalink layer. Could the problem defined in step one be caused by a datalink layer issue? Was the encapsulation changed on a router interface? If frame relay, is the router seeing LMI from the frame relay switch? Is there evidence of dropped frames completely within the cloud (granted, that's not necessarily datalink layer, but it is a separate 'administrative' layer if it's out of your control.) I'm sure you can think of a number of other examples. Could the problem defined in step one be the results of a network layer issue? Is there evidence of a routing loop? Do the devices involved have routing tables that appear to be correct? What do traceroutes and pings show? Teach them to go hop-by-hop and verify that everything appears as it should, starting with the device closest to the problem if it's possible to narrow it down that far. If routing is determined to be correct, could this be a transport layer issue? Is it possible that an access list or firewall somewhere is blocking only certain types of traffic? Does the problem only involve HTTP? SMTP? Is there policy routing involved that might be redirecting certain types of traffic to the wrong destination? Where there *any* recent configuration changes? If so, what were they? Find out, because they might be the cause of the problem. This is the general framework I use for troubleshooting and that's how I've taught the people that work with me. It's constantly evolving and, of course, the specific steps taken depend on the nature of the issue, but I find that it helps to have a good foundational troubleshooting framework. John --
Re: Teaching/developing troubleshooting skills
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pete Kruckenberg wrote: | I'm working on trying to teach others in my group (usually | less-experienced, but not always) how to improve their | large-network troubleshooting skills (the techniques of | isolating a problem, etc). | | It's been so long since I learned network troubleshooting | techniques I can't remember how I learned them or even how I | used to do it (so poorly). | | Does anyone have experience with developing a | skills-improvement program on this topic? If you've tried | such a thing, what worked/didn't work for you? Outside | training? Books? Mentoring? Motivational posters? | | I'm particularly sensitive to the "I got my CCNA, therefore | I know everything there is to know about troubleshooting" | perspective, and how to encourage improving troubleshooting | skills without making it insultingly basic. | If you are looking for some courses on just analytical troubleshooting and/or problem solving techniques, you might want to look at the Kepner Tregoe stuff (www.kepner-tregoe.com). It is not network specific but rather teaches techniques. Some of their courses include: Problem Solving and Decision Making Analytic Trouble Shooting Implementing Corrective and Preventive Actions - -- = bep -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (MingW32) iD8DBQFA23J8E1XcgMgrtyYRAun6AKCmtmTkq8Pyq5xYBud478424x67kACeP6w9 uBUJo/El3rVXRC7TBkpb2DA= =q+YH -END PGP SIGNATURE-
Re: Teaching/developing troubleshooting skills
Pete Kruckenberg wrote: > > I'm working on trying to teach others in my group (usually > less-experienced, but not always) how to improve their > large-network troubleshooting skills (the techniques of > isolating a problem, etc). There are several vendors that offer these types of courses, and I am sure that if you search for courseware, you can find some good materials you could use to teach your own sessions in house. Jon -- Jon R. Kibler Chief Technical Officer A.S.E.T., Inc. Charleston, SC USA (843) 849-8214 == Filtered by: TRUSTEM.COM's Email Filtering Service http://www.trustem.com/ No Spam. No Viruses. Just Good Clean Email.
RE: Teaching/developing troubleshooting skills
Hi Pete, If you have a test lab, a good thing would be to setup a complete functional network. Show the engineer how it's configured. Then have them leave the room and then break it. Send them back in to look at what is wrong. As they move through the process, help them by guiding them through the troubleshooting process in a mentoring fashion, help them analyze and break apart the problem. LP Best Regards, Larry Larry Pingree "Visionary people, are visionary, partly because of the great many things they never get to see." - Larry Pingree -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pete Kruckenberg Sent: Thursday, June 24, 2004 4:09 PM To: [EMAIL PROTECTED] Subject: Teaching/developing troubleshooting skills I'm working on trying to teach others in my group (usually less-experienced, but not always) how to improve their large-network troubleshooting skills (the techniques of isolating a problem, etc). It's been so long since I learned network troubleshooting techniques I can't remember how I learned them or even how I used to do it (so poorly). Does anyone have experience with developing a skills-improvement program on this topic? If you've tried such a thing, what worked/didn't work for you? Outside training? Books? Mentoring? Motivational posters? I'm particularly sensitive to the "I got my CCNA, therefore I know everything there is to know about troubleshooting" perspective, and how to encourage improving troubleshooting skills without making it insultingly basic. Thanks for your help. Pete.
Teaching/developing troubleshooting skills
I'm working on trying to teach others in my group (usually less-experienced, but not always) how to improve their large-network troubleshooting skills (the techniques of isolating a problem, etc). It's been so long since I learned network troubleshooting techniques I can't remember how I learned them or even how I used to do it (so poorly). Does anyone have experience with developing a skills-improvement program on this topic? If you've tried such a thing, what worked/didn't work for you? Outside training? Books? Mentoring? Motivational posters? I'm particularly sensitive to the "I got my CCNA, therefore I know everything there is to know about troubleshooting" perspective, and how to encourage improving troubleshooting skills without making it insultingly basic. Thanks for your help. Pete.