> Automated config deployment / provisioning. And sanity checking > before deployment.
Easy to say, not so easy to do. For instance, that incorrect port was identified by a number or name. Theoretically, if an automated tool pulls the number/name from a database and issues the command, then the error cannot happen. But how does the number/name get into the database. I've seen a situation where a human being enters that number, copying it from another application screen. We hope that it is done by copy/paste all the time but who knows? And even copy/paste can make mistakes if the selection is done by mouse by someone who isn't paying enough attention. But wait! How did the other application come up with that number for copying? Actually, it was copy-pasted from yet a third application, and that application got it by copy paste from a spreadsheet. It is easy to create a tangled mess of OSS applications that are glued together by lots of manual human effort creating numerous opportunities for human error. So while I wholeheartedly support automation of network configuration, that is not a magic bullet. You also need to pay attention to the whole process, the whole chain of information flow. And there are other things that may be even more effective such as hiding your human errors. This is commonly called a "maintenance window" and it involves an absolute ban on making any network change, no matter how trivial, outside of a maintenance window. The human error can still occur but because it is in a maintenance window, the customer either doesn't notice, or if it is planned maintenance, they don't complain because they are expecting a bit of disruption and have agreed to the planned maintenance window. That only leaves break-fix work which is where the most skilled and trusted engineers work on the live network outside of maintenance windows to fix stuff that is seriously broken. It sounds like the event in the original posting was something like that, but perhaps not, because this kind of break-fix work should only be done when there is already a customer-affecting issue. By the way, even break-fix changes can, and should be, tested in a lab environment before you push them onto the network. --Michael Dillon